2023-05-18 20:25:35,195 INFO [finetune.py:1062] (0/2) Training started 2023-05-18 20:25:35,198 INFO [finetune.py:1072] (0/2) Device: cuda:0 2023-05-18 20:25:35,200 INFO [finetune.py:1081] (0/2) {'frame_shift_ms': 10.0, 'allowed_excess_duration_ratio': 0.1, 'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.23.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'a23383c5a381713b51e9014f3f05d096f8aceec3', 'k2-git-date': 'Wed Apr 26 15:33:33 2023', 'lhotse-version': '1.14.0.dev+git.b61b917.dirty', 'torch-version': '1.13.1', 'torch-cuda-available': True, 'torch-cuda-version': '11.6', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': '45c13e9-dirty', 'icefall-git-date': 'Mon Apr 24 15:00:02 2023', 'icefall-path': '/k2-dev/yangyifan/icefall-master', 'k2-path': '/k2-dev/yangyifan/anaconda3/envs/icefall/lib/python3.10/site-packages/k2-1.23.4.dev20230427+cuda11.6.torch1.13.1-py3.10-linux-x86_64.egg/k2/__init__.py', 'lhotse-path': '/k2-dev/yangyifan/anaconda3/envs/icefall/lib/python3.10/site-packages/lhotse-1.14.0.dev0+git.b61b917.dirty-py3.10.egg/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-10-0221105906-5745685d6b-t8zzx', 'IP address': '10.177.57.19'}, 'world_size': 2, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 18, 'start_batch': 0, 'exp_dir': PosixPath('pruned_transducer_stateless7/exp_giga_finetune'), 'bpe_model': 'icefall-asr-librispeech-pruned-transducer-stateless7-2022-11-11/data/lang_bpe_500/bpe.model', 'base_lr': 0.005, 'lr_batches': 100000.0, 'lr_epochs': 100.0, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 2000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'num_encoder_layers': '2,4,3,2,4', 'feedforward_dims': '1024,1024,2048,2048,1024', 'nhead': '8,8,8,8,8', 'encoder_dims': '384,384,384,384,384', 'attention_dims': '192,192,192,192,192', 'encoder_unmasked_dims': '256,256,256,256,256', 'zipformer_downsampling_factors': '1,2,4,8,2', 'cnn_module_kernels': '31,31,31,31,31', 'decoder_dim': 512, 'joiner_dim': 512, 'do_finetune': True, 'use_mux': True, 'init_modules': None, 'finetune_ckpt': None, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 500, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'subset': 'S', 'small_dev': False, 'blank_id': 0, 'vocab_size': 500} 2023-05-18 20:25:35,200 INFO [finetune.py:1083] (0/2) About to create model 2023-05-18 20:25:35,866 INFO [zipformer.py:178] (0/2) At encoder stack 4, which has downsampling_factor=2, we will combine the outputs of layers 1 and 3, with downsampling_factors=2 and 8. 2023-05-18 20:25:35,881 INFO [finetune.py:1087] (0/2) Number of model parameters: 70369391 2023-05-18 20:25:36,345 INFO [checkpoint.py:112] (0/2) Loading checkpoint from pruned_transducer_stateless7/exp_giga_finetune/epoch-17.pt 2023-05-18 20:25:38,975 INFO [checkpoint.py:131] (0/2) Loading averaged model 2023-05-18 20:25:42,043 INFO [finetune.py:1109] (0/2) Using DDP 2023-05-18 20:25:42,250 INFO [finetune.py:1129] (0/2) Loading optimizer state dict 2023-05-18 20:25:42,686 INFO [finetune.py:1137] (0/2) Loading scheduler state dict 2023-05-18 20:25:42,686 INFO [asr_datamodule.py:425] (0/2) About to get the shuffled train-clean-100, train-clean-360 and train-other-500 cuts 2023-05-18 20:25:42,703 INFO [gigaspeech.py:389] (0/2) About to get train_S cuts 2023-05-18 20:25:42,703 INFO [gigaspeech.py:216] (0/2) Enable MUSAN 2023-05-18 20:25:42,703 INFO [gigaspeech.py:217] (0/2) About to get Musan cuts 2023-05-18 20:25:44,985 INFO [gigaspeech.py:241] (0/2) Enable SpecAugment 2023-05-18 20:25:44,985 INFO [gigaspeech.py:242] (0/2) Time warp factor: 80 2023-05-18 20:25:44,985 INFO [gigaspeech.py:252] (0/2) Num frame mask: 10 2023-05-18 20:25:44,985 INFO [gigaspeech.py:265] (0/2) About to create train dataset 2023-05-18 20:25:44,986 INFO [gigaspeech.py:291] (0/2) Using DynamicBucketingSampler. 2023-05-18 20:25:49,431 INFO [gigaspeech.py:306] (0/2) About to create train dataloader 2023-05-18 20:25:49,432 INFO [gigaspeech.py:396] (0/2) About to get dev cuts 2023-05-18 20:25:49,440 INFO [gigaspeech.py:337] (0/2) About to create dev dataset 2023-05-18 20:25:49,772 INFO [gigaspeech.py:354] (0/2) About to create dev dataloader 2023-05-18 20:25:49,772 INFO [finetune.py:1225] (0/2) Loading grad scaler state dict 2023-05-18 20:26:07,143 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.2368, 4.8841, 5.2249, 4.6622, 4.9627, 4.8359, 5.2681, 4.8787], device='cuda:0'), covar=tensor([0.0291, 0.0372, 0.0274, 0.0300, 0.0419, 0.0359, 0.0192, 0.0281], device='cuda:0'), in_proj_covar=tensor([0.0259, 0.0264, 0.0285, 0.0262, 0.0258, 0.0257, 0.0235, 0.0209], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 20:26:08,258 WARNING [optim.py:388] (0/2) Scaling gradients by 0.045329876244068146, model_norm_threshold=726.4490966796875 2023-05-18 20:26:08,394 INFO [optim.py:450] (0/2) Parameter Dominanting tot_sumsq module.encoder.encoders.3.out_combiner.weight1 with proportion 0.72, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.848e+08, grad_sumsq = 1.848e+08, orig_rms_sq=1.000e+00 2023-05-18 20:26:08,431 INFO [finetune.py:992] (0/2) Epoch 18, batch 0, loss[loss=0.3848, simple_loss=0.4208, pruned_loss=0.1744, over 12294.00 frames. ], tot_loss[loss=0.3848, simple_loss=0.4208, pruned_loss=0.1744, over 12294.00 frames. ], batch size: 34, lr: 3.27e-03, grad_scale: 8.0 2023-05-18 20:26:08,432 INFO [finetune.py:1017] (0/2) Computing validation loss 2023-05-18 20:26:25,050 INFO [finetune.py:1026] (0/2) Epoch 18, validation: loss=0.2903, simple_loss=0.3616, pruned_loss=0.1095, over 1020973.00 frames. 2023-05-18 20:26:25,050 INFO [finetune.py:1027] (0/2) Maximum memory allocated so far is 11141MB 2023-05-18 20:26:28,641 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.96 vs. limit=5.0 2023-05-18 20:26:38,417 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.449e+02 3.327e+02 3.889e+02 4.846e+02 1.603e+04, threshold=7.779e+02, percent-clipped=2.0 2023-05-18 20:26:39,266 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=307998.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:26:41,028 INFO [checkpoint.py:75] (0/2) Saving checkpoint to pruned_transducer_stateless7/exp_giga_finetune/checkpoint-208000.pt 2023-05-18 20:27:05,540 INFO [finetune.py:992] (0/2) Epoch 18, batch 50, loss[loss=0.1504, simple_loss=0.2392, pruned_loss=0.03077, over 12329.00 frames. ], tot_loss[loss=0.1772, simple_loss=0.2661, pruned_loss=0.04408, over 534452.84 frames. ], batch size: 31, lr: 3.27e-03, grad_scale: 8.0 2023-05-18 20:27:08,541 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.1413, 3.8268, 3.9525, 4.2875, 2.9744, 3.8488, 2.4260, 3.9792], device='cuda:0'), covar=tensor([0.2067, 0.1028, 0.1166, 0.0823, 0.1373, 0.0783, 0.2358, 0.1041], device='cuda:0'), in_proj_covar=tensor([0.0225, 0.0262, 0.0288, 0.0345, 0.0237, 0.0237, 0.0254, 0.0356], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-18 20:27:17,003 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=308046.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:27:17,040 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.3887, 6.1266, 5.6700, 5.7427, 6.1805, 5.5308, 5.6568, 5.6607], device='cuda:0'), covar=tensor([0.1494, 0.0997, 0.1246, 0.1975, 0.1111, 0.2586, 0.2053, 0.1383], device='cuda:0'), in_proj_covar=tensor([0.0357, 0.0497, 0.0405, 0.0443, 0.0459, 0.0435, 0.0399, 0.0385], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 20:27:18,069 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.25 vs. limit=2.0 2023-05-18 20:27:41,408 INFO [finetune.py:992] (0/2) Epoch 18, batch 100, loss[loss=0.1617, simple_loss=0.2501, pruned_loss=0.03668, over 12346.00 frames. ], tot_loss[loss=0.175, simple_loss=0.2654, pruned_loss=0.04231, over 951676.20 frames. ], batch size: 31, lr: 3.27e-03, grad_scale: 8.0 2023-05-18 20:27:53,418 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.085e+02 2.648e+02 3.135e+02 3.749e+02 6.822e+02, threshold=6.269e+02, percent-clipped=0.0 2023-05-18 20:28:01,125 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.41 vs. limit=2.0 2023-05-18 20:28:01,564 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.17 vs. limit=2.0 2023-05-18 20:28:16,087 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.8284, 4.7195, 4.6602, 4.7137, 4.4502, 4.8813, 4.8860, 5.0623], device='cuda:0'), covar=tensor([0.0270, 0.0200, 0.0223, 0.0402, 0.0769, 0.0291, 0.0179, 0.0205], device='cuda:0'), in_proj_covar=tensor([0.0186, 0.0186, 0.0180, 0.0233, 0.0225, 0.0207, 0.0166, 0.0220], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:0') 2023-05-18 20:28:17,245 INFO [finetune.py:992] (0/2) Epoch 18, batch 150, loss[loss=0.1902, simple_loss=0.2842, pruned_loss=0.04812, over 11580.00 frames. ], tot_loss[loss=0.1743, simple_loss=0.265, pruned_loss=0.04181, over 1257593.71 frames. ], batch size: 48, lr: 3.27e-03, grad_scale: 8.0 2023-05-18 20:28:52,823 INFO [finetune.py:992] (0/2) Epoch 18, batch 200, loss[loss=0.166, simple_loss=0.2542, pruned_loss=0.03885, over 12151.00 frames. ], tot_loss[loss=0.1711, simple_loss=0.2619, pruned_loss=0.04016, over 1508532.93 frames. ], batch size: 29, lr: 3.27e-03, grad_scale: 8.0 2023-05-18 20:29:04,739 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 2.636e+02 3.102e+02 3.454e+02 6.960e+02, threshold=6.205e+02, percent-clipped=1.0 2023-05-18 20:29:08,688 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=308202.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:29:27,946 INFO [finetune.py:992] (0/2) Epoch 18, batch 250, loss[loss=0.1536, simple_loss=0.2399, pruned_loss=0.03368, over 12303.00 frames. ], tot_loss[loss=0.1687, simple_loss=0.2594, pruned_loss=0.03905, over 1708577.90 frames. ], batch size: 33, lr: 3.27e-03, grad_scale: 8.0 2023-05-18 20:29:44,728 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=308254.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:29:48,949 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=308260.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:29:51,150 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=308263.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:30:00,168 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=308276.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:30:03,386 INFO [finetune.py:992] (0/2) Epoch 18, batch 300, loss[loss=0.1723, simple_loss=0.2803, pruned_loss=0.03213, over 12123.00 frames. ], tot_loss[loss=0.168, simple_loss=0.2585, pruned_loss=0.03876, over 1855675.04 frames. ], batch size: 38, lr: 3.27e-03, grad_scale: 8.0 2023-05-18 20:30:15,170 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.898e+02 2.684e+02 3.202e+02 3.924e+02 5.708e+02, threshold=6.404e+02, percent-clipped=0.0 2023-05-18 20:30:18,638 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=308302.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:30:23,315 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=308308.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:30:24,120 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=308309.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:30:34,574 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=308324.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:30:38,489 INFO [finetune.py:992] (0/2) Epoch 18, batch 350, loss[loss=0.1431, simple_loss=0.2301, pruned_loss=0.02808, over 12286.00 frames. ], tot_loss[loss=0.1675, simple_loss=0.258, pruned_loss=0.03847, over 1963994.18 frames. ], batch size: 33, lr: 3.27e-03, grad_scale: 8.0 2023-05-18 20:31:06,514 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=308370.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:31:13,290 INFO [finetune.py:992] (0/2) Epoch 18, batch 400, loss[loss=0.1392, simple_loss=0.2198, pruned_loss=0.02933, over 12283.00 frames. ], tot_loss[loss=0.1652, simple_loss=0.2551, pruned_loss=0.03766, over 2065552.99 frames. ], batch size: 28, lr: 3.27e-03, grad_scale: 8.0 2023-05-18 20:31:24,829 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.628e+02 2.483e+02 3.007e+02 3.597e+02 5.788e+02, threshold=6.013e+02, percent-clipped=0.0 2023-05-18 20:31:47,944 INFO [finetune.py:992] (0/2) Epoch 18, batch 450, loss[loss=0.1614, simple_loss=0.2569, pruned_loss=0.03298, over 12373.00 frames. ], tot_loss[loss=0.1659, simple_loss=0.2561, pruned_loss=0.03782, over 2131878.31 frames. ], batch size: 35, lr: 3.27e-03, grad_scale: 8.0 2023-05-18 20:31:56,099 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.5702, 5.1070, 5.5549, 4.8328, 5.2017, 4.9317, 5.5805, 5.2375], device='cuda:0'), covar=tensor([0.0258, 0.0375, 0.0249, 0.0273, 0.0420, 0.0379, 0.0200, 0.0273], device='cuda:0'), in_proj_covar=tensor([0.0270, 0.0276, 0.0299, 0.0272, 0.0269, 0.0269, 0.0245, 0.0219], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 20:32:23,764 INFO [finetune.py:992] (0/2) Epoch 18, batch 500, loss[loss=0.1872, simple_loss=0.2809, pruned_loss=0.04672, over 12270.00 frames. ], tot_loss[loss=0.1661, simple_loss=0.2563, pruned_loss=0.03797, over 2185459.06 frames. ], batch size: 37, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:32:35,723 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.630e+02 3.100e+02 3.585e+02 6.285e+02, threshold=6.200e+02, percent-clipped=2.0 2023-05-18 20:32:36,585 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.6405, 5.4469, 5.5457, 5.6238, 5.2653, 5.2977, 5.0613, 5.5034], device='cuda:0'), covar=tensor([0.0713, 0.0626, 0.0813, 0.0578, 0.1872, 0.1250, 0.0580, 0.1160], device='cuda:0'), in_proj_covar=tensor([0.0544, 0.0704, 0.0624, 0.0633, 0.0839, 0.0748, 0.0561, 0.0489], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0003, 0.0003], device='cuda:0') 2023-05-18 20:32:54,018 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([6.0732, 6.1110, 5.7782, 5.3457, 5.2345, 5.9742, 5.6733, 5.3443], device='cuda:0'), covar=tensor([0.0685, 0.0833, 0.0745, 0.1590, 0.0762, 0.0735, 0.1510, 0.1196], device='cuda:0'), in_proj_covar=tensor([0.0650, 0.0581, 0.0530, 0.0650, 0.0436, 0.0738, 0.0793, 0.0581], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002], device='cuda:0') 2023-05-18 20:32:58,587 INFO [finetune.py:992] (0/2) Epoch 18, batch 550, loss[loss=0.2365, simple_loss=0.3059, pruned_loss=0.08354, over 7946.00 frames. ], tot_loss[loss=0.1661, simple_loss=0.2561, pruned_loss=0.03802, over 2221477.53 frames. ], batch size: 98, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:33:14,354 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=308552.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:33:18,195 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=308558.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:33:33,289 INFO [finetune.py:992] (0/2) Epoch 18, batch 600, loss[loss=0.175, simple_loss=0.2625, pruned_loss=0.04371, over 12112.00 frames. ], tot_loss[loss=0.1655, simple_loss=0.256, pruned_loss=0.03754, over 2256326.34 frames. ], batch size: 38, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:33:45,725 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.756e+02 2.534e+02 3.033e+02 3.710e+02 8.402e+02, threshold=6.066e+02, percent-clipped=2.0 2023-05-18 20:33:57,975 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=308613.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:34:09,482 INFO [finetune.py:992] (0/2) Epoch 18, batch 650, loss[loss=0.1369, simple_loss=0.2238, pruned_loss=0.02496, over 12273.00 frames. ], tot_loss[loss=0.1647, simple_loss=0.2553, pruned_loss=0.03703, over 2284708.18 frames. ], batch size: 28, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:34:26,869 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=2.69 vs. limit=5.0 2023-05-18 20:34:33,422 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=308665.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:34:36,138 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=308669.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:34:43,510 INFO [finetune.py:992] (0/2) Epoch 18, batch 700, loss[loss=0.1646, simple_loss=0.2681, pruned_loss=0.03052, over 12120.00 frames. ], tot_loss[loss=0.1642, simple_loss=0.255, pruned_loss=0.03676, over 2307173.21 frames. ], batch size: 39, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:34:55,203 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.797e+02 2.630e+02 3.034e+02 3.699e+02 6.493e+02, threshold=6.068e+02, percent-clipped=1.0 2023-05-18 20:35:17,797 INFO [finetune.py:992] (0/2) Epoch 18, batch 750, loss[loss=0.1799, simple_loss=0.269, pruned_loss=0.04537, over 12072.00 frames. ], tot_loss[loss=0.1644, simple_loss=0.2549, pruned_loss=0.03696, over 2325645.58 frames. ], batch size: 42, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:35:17,957 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=308730.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:35:19,294 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=308732.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:35:23,303 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.0731, 4.6914, 2.8884, 2.6517, 4.0206, 2.6619, 4.0177, 3.1007], device='cuda:0'), covar=tensor([0.0888, 0.0529, 0.1330, 0.1621, 0.0281, 0.1419, 0.0407, 0.0981], device='cuda:0'), in_proj_covar=tensor([0.0188, 0.0252, 0.0177, 0.0199, 0.0140, 0.0183, 0.0198, 0.0174], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 20:35:53,194 INFO [finetune.py:992] (0/2) Epoch 18, batch 800, loss[loss=0.1517, simple_loss=0.2438, pruned_loss=0.02977, over 12124.00 frames. ], tot_loss[loss=0.1648, simple_loss=0.255, pruned_loss=0.03728, over 2340727.10 frames. ], batch size: 33, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:36:02,533 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=308793.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:36:04,943 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.864e+02 2.681e+02 3.102e+02 3.780e+02 6.537e+02, threshold=6.204e+02, percent-clipped=1.0 2023-05-18 20:36:28,168 INFO [finetune.py:992] (0/2) Epoch 18, batch 850, loss[loss=0.1744, simple_loss=0.2659, pruned_loss=0.04145, over 12284.00 frames. ], tot_loss[loss=0.1648, simple_loss=0.255, pruned_loss=0.03733, over 2353993.48 frames. ], batch size: 37, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:36:47,830 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=308858.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:37:02,666 INFO [finetune.py:992] (0/2) Epoch 18, batch 900, loss[loss=0.1417, simple_loss=0.2424, pruned_loss=0.02049, over 12296.00 frames. ], tot_loss[loss=0.1642, simple_loss=0.2547, pruned_loss=0.03682, over 2362542.40 frames. ], batch size: 33, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:37:11,377 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.4587, 4.7430, 4.1355, 5.0373, 4.7115, 2.9322, 4.2870, 3.1509], device='cuda:0'), covar=tensor([0.0770, 0.0848, 0.1614, 0.0549, 0.1066, 0.1888, 0.1165, 0.3483], device='cuda:0'), in_proj_covar=tensor([0.0314, 0.0384, 0.0366, 0.0335, 0.0376, 0.0280, 0.0353, 0.0373], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 20:37:15,123 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.650e+02 2.566e+02 3.018e+02 3.526e+02 6.170e+02, threshold=6.037e+02, percent-clipped=0.0 2023-05-18 20:37:22,009 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=308906.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:37:23,379 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=308908.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:37:28,420 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.80 vs. limit=2.0 2023-05-18 20:37:38,567 INFO [finetune.py:992] (0/2) Epoch 18, batch 950, loss[loss=0.1697, simple_loss=0.266, pruned_loss=0.03672, over 12345.00 frames. ], tot_loss[loss=0.1638, simple_loss=0.2545, pruned_loss=0.03659, over 2370919.59 frames. ], batch size: 36, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:37:53,881 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.77 vs. limit=2.0 2023-05-18 20:37:55,520 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.4846, 5.0508, 5.4692, 4.7693, 5.1246, 4.8852, 5.4894, 5.1598], device='cuda:0'), covar=tensor([0.0279, 0.0408, 0.0266, 0.0277, 0.0414, 0.0349, 0.0228, 0.0270], device='cuda:0'), in_proj_covar=tensor([0.0270, 0.0276, 0.0299, 0.0272, 0.0270, 0.0269, 0.0244, 0.0220], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 20:38:03,057 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=308965.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:38:03,949 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=3.04 vs. limit=5.0 2023-05-18 20:38:13,226 INFO [finetune.py:992] (0/2) Epoch 18, batch 1000, loss[loss=0.1447, simple_loss=0.2352, pruned_loss=0.02713, over 12114.00 frames. ], tot_loss[loss=0.1647, simple_loss=0.2551, pruned_loss=0.03715, over 2353586.55 frames. ], batch size: 30, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:38:24,917 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.628e+02 3.271e+02 3.766e+02 7.611e+02, threshold=6.542e+02, percent-clipped=1.0 2023-05-18 20:38:25,196 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.2709, 3.7952, 3.9106, 4.2149, 2.9538, 3.7379, 2.5590, 3.7793], device='cuda:0'), covar=tensor([0.1667, 0.0840, 0.0919, 0.0696, 0.1188, 0.0662, 0.1900, 0.0949], device='cuda:0'), in_proj_covar=tensor([0.0234, 0.0272, 0.0300, 0.0358, 0.0246, 0.0245, 0.0264, 0.0372], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-18 20:38:36,525 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=309013.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:38:44,755 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=309025.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:38:48,174 INFO [finetune.py:992] (0/2) Epoch 18, batch 1050, loss[loss=0.1327, simple_loss=0.2209, pruned_loss=0.0223, over 12134.00 frames. ], tot_loss[loss=0.164, simple_loss=0.2541, pruned_loss=0.03695, over 2363660.98 frames. ], batch size: 30, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:38:57,194 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.3834, 4.9382, 5.3789, 4.6599, 4.9993, 4.7746, 5.3907, 5.0555], device='cuda:0'), covar=tensor([0.0270, 0.0429, 0.0273, 0.0296, 0.0456, 0.0367, 0.0209, 0.0284], device='cuda:0'), in_proj_covar=tensor([0.0273, 0.0279, 0.0302, 0.0274, 0.0273, 0.0271, 0.0246, 0.0222], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 20:39:13,045 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.2733, 4.8030, 3.1033, 2.8874, 4.1537, 2.7636, 4.0755, 3.3399], device='cuda:0'), covar=tensor([0.0834, 0.0591, 0.1197, 0.1511, 0.0308, 0.1398, 0.0520, 0.0884], device='cuda:0'), in_proj_covar=tensor([0.0189, 0.0255, 0.0178, 0.0201, 0.0140, 0.0185, 0.0200, 0.0176], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 20:39:23,861 INFO [finetune.py:992] (0/2) Epoch 18, batch 1100, loss[loss=0.1638, simple_loss=0.2638, pruned_loss=0.03186, over 12189.00 frames. ], tot_loss[loss=0.1639, simple_loss=0.2538, pruned_loss=0.03704, over 2364081.28 frames. ], batch size: 35, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:39:29,462 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=309088.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:39:35,509 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.949e+02 2.720e+02 3.175e+02 3.829e+02 6.292e+02, threshold=6.350e+02, percent-clipped=0.0 2023-05-18 20:39:58,581 INFO [finetune.py:992] (0/2) Epoch 18, batch 1150, loss[loss=0.1617, simple_loss=0.2557, pruned_loss=0.03387, over 12144.00 frames. ], tot_loss[loss=0.1638, simple_loss=0.2536, pruned_loss=0.03705, over 2372702.28 frames. ], batch size: 36, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:40:04,612 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.17 vs. limit=2.0 2023-05-18 20:40:33,684 INFO [finetune.py:992] (0/2) Epoch 18, batch 1200, loss[loss=0.1616, simple_loss=0.2458, pruned_loss=0.03871, over 11820.00 frames. ], tot_loss[loss=0.1646, simple_loss=0.2544, pruned_loss=0.03739, over 2370545.77 frames. ], batch size: 26, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:40:45,662 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.058e+02 2.688e+02 3.187e+02 3.612e+02 5.353e+02, threshold=6.374e+02, percent-clipped=0.0 2023-05-18 20:40:54,964 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=309208.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:41:09,968 INFO [finetune.py:992] (0/2) Epoch 18, batch 1250, loss[loss=0.1649, simple_loss=0.2535, pruned_loss=0.03817, over 12092.00 frames. ], tot_loss[loss=0.1642, simple_loss=0.2541, pruned_loss=0.03714, over 2373508.44 frames. ], batch size: 42, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:41:28,405 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=309256.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:41:45,079 INFO [finetune.py:992] (0/2) Epoch 18, batch 1300, loss[loss=0.1458, simple_loss=0.2228, pruned_loss=0.03434, over 11994.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.252, pruned_loss=0.03653, over 2373081.58 frames. ], batch size: 28, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:41:56,941 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.593e+02 2.360e+02 2.832e+02 3.330e+02 7.735e+02, threshold=5.664e+02, percent-clipped=3.0 2023-05-18 20:42:05,512 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=2.87 vs. limit=5.0 2023-05-18 20:42:16,301 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=309325.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:42:19,784 INFO [finetune.py:992] (0/2) Epoch 18, batch 1350, loss[loss=0.1583, simple_loss=0.247, pruned_loss=0.03484, over 12185.00 frames. ], tot_loss[loss=0.1633, simple_loss=0.2528, pruned_loss=0.03688, over 2374070.74 frames. ], batch size: 29, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:42:50,543 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=309373.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:42:55,228 INFO [finetune.py:992] (0/2) Epoch 18, batch 1400, loss[loss=0.1457, simple_loss=0.2314, pruned_loss=0.03, over 11815.00 frames. ], tot_loss[loss=0.1638, simple_loss=0.2533, pruned_loss=0.03718, over 2372998.14 frames. ], batch size: 26, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:43:00,848 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=309388.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:43:06,918 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.037e+02 2.658e+02 3.169e+02 3.760e+02 1.278e+03, threshold=6.339e+02, percent-clipped=2.0 2023-05-18 20:43:07,270 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.17 vs. limit=2.0 2023-05-18 20:43:16,967 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=309411.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:43:30,045 INFO [finetune.py:992] (0/2) Epoch 18, batch 1450, loss[loss=0.1685, simple_loss=0.2536, pruned_loss=0.04165, over 12079.00 frames. ], tot_loss[loss=0.1639, simple_loss=0.2535, pruned_loss=0.03714, over 2373635.79 frames. ], batch size: 39, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:43:34,269 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=309436.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:43:34,582 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.17 vs. limit=2.0 2023-05-18 20:43:59,386 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=309472.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:44:04,857 INFO [finetune.py:992] (0/2) Epoch 18, batch 1500, loss[loss=0.1458, simple_loss=0.2311, pruned_loss=0.03028, over 11994.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.2533, pruned_loss=0.03671, over 2372280.89 frames. ], batch size: 28, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:44:17,153 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.737e+02 2.631e+02 3.171e+02 3.851e+02 8.126e+02, threshold=6.342e+02, percent-clipped=2.0 2023-05-18 20:44:40,497 INFO [finetune.py:992] (0/2) Epoch 18, batch 1550, loss[loss=0.1449, simple_loss=0.2346, pruned_loss=0.02757, over 12195.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2529, pruned_loss=0.03648, over 2380911.04 frames. ], batch size: 31, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:44:44,896 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.5067, 3.7080, 3.2570, 3.2454, 2.8107, 2.6962, 3.6354, 2.3739], device='cuda:0'), covar=tensor([0.0464, 0.0179, 0.0260, 0.0219, 0.0516, 0.0483, 0.0165, 0.0559], device='cuda:0'), in_proj_covar=tensor([0.0201, 0.0167, 0.0172, 0.0196, 0.0208, 0.0205, 0.0181, 0.0210], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 20:44:59,495 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.3558, 4.7074, 4.1586, 4.9328, 4.6071, 2.8972, 4.1813, 2.9967], device='cuda:0'), covar=tensor([0.0803, 0.0765, 0.1397, 0.0696, 0.1114, 0.1859, 0.1143, 0.3655], device='cuda:0'), in_proj_covar=tensor([0.0317, 0.0386, 0.0368, 0.0338, 0.0380, 0.0283, 0.0356, 0.0376], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 20:45:10,444 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=309573.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:45:14,849 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.5376, 4.8229, 4.1359, 5.0596, 4.6250, 3.0564, 4.2604, 3.1582], device='cuda:0'), covar=tensor([0.0811, 0.0743, 0.1650, 0.0563, 0.1300, 0.1707, 0.1234, 0.3461], device='cuda:0'), in_proj_covar=tensor([0.0317, 0.0386, 0.0367, 0.0338, 0.0379, 0.0283, 0.0355, 0.0376], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 20:45:15,291 INFO [finetune.py:992] (0/2) Epoch 18, batch 1600, loss[loss=0.153, simple_loss=0.2347, pruned_loss=0.03567, over 12185.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2528, pruned_loss=0.03619, over 2383191.11 frames. ], batch size: 29, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:45:24,433 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.1080, 2.4798, 3.5874, 4.1856, 3.6565, 4.1571, 3.7955, 2.9140], device='cuda:0'), covar=tensor([0.0062, 0.0432, 0.0148, 0.0046, 0.0159, 0.0089, 0.0117, 0.0430], device='cuda:0'), in_proj_covar=tensor([0.0094, 0.0127, 0.0108, 0.0082, 0.0108, 0.0121, 0.0105, 0.0143], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 20:45:27,057 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.732e+02 2.535e+02 2.970e+02 3.567e+02 5.666e+02, threshold=5.940e+02, percent-clipped=0.0 2023-05-18 20:45:45,079 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.2756, 3.5527, 3.6600, 4.0016, 2.6388, 3.5339, 2.4693, 3.4575], device='cuda:0'), covar=tensor([0.1740, 0.1045, 0.1100, 0.0758, 0.1491, 0.0813, 0.2078, 0.1053], device='cuda:0'), in_proj_covar=tensor([0.0234, 0.0274, 0.0303, 0.0363, 0.0248, 0.0247, 0.0265, 0.0376], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-18 20:45:50,302 INFO [finetune.py:992] (0/2) Epoch 18, batch 1650, loss[loss=0.1645, simple_loss=0.2563, pruned_loss=0.03634, over 11797.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2529, pruned_loss=0.03612, over 2384543.12 frames. ], batch size: 44, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:45:51,910 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.6884, 2.9352, 3.3371, 4.5338, 2.6706, 4.4875, 4.6363, 4.7044], device='cuda:0'), covar=tensor([0.0132, 0.1186, 0.0515, 0.0126, 0.1325, 0.0246, 0.0181, 0.0102], device='cuda:0'), in_proj_covar=tensor([0.0126, 0.0207, 0.0185, 0.0124, 0.0191, 0.0183, 0.0182, 0.0128], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 20:45:52,677 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.7502, 3.3376, 5.1218, 2.6255, 2.8210, 3.6790, 3.2016, 3.7107], device='cuda:0'), covar=tensor([0.0492, 0.1290, 0.0344, 0.1327, 0.2147, 0.1696, 0.1477, 0.1250], device='cuda:0'), in_proj_covar=tensor([0.0239, 0.0241, 0.0260, 0.0187, 0.0243, 0.0299, 0.0230, 0.0272], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 20:45:53,326 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=309634.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:46:12,737 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.07 vs. limit=5.0 2023-05-18 20:46:26,015 INFO [finetune.py:992] (0/2) Epoch 18, batch 1700, loss[loss=0.1545, simple_loss=0.24, pruned_loss=0.03453, over 12193.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2531, pruned_loss=0.03638, over 2368807.43 frames. ], batch size: 31, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:46:37,503 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.745e+02 3.098e+02 3.777e+02 1.829e+03, threshold=6.196e+02, percent-clipped=5.0 2023-05-18 20:47:00,281 INFO [finetune.py:992] (0/2) Epoch 18, batch 1750, loss[loss=0.1735, simple_loss=0.2698, pruned_loss=0.03853, over 12267.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2532, pruned_loss=0.03635, over 2374992.87 frames. ], batch size: 37, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:47:06,389 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.9695, 4.6716, 4.9509, 4.2685, 4.6396, 4.4143, 4.9776, 4.6616], device='cuda:0'), covar=tensor([0.0313, 0.0398, 0.0366, 0.0351, 0.0520, 0.0377, 0.0263, 0.0440], device='cuda:0'), in_proj_covar=tensor([0.0279, 0.0284, 0.0308, 0.0280, 0.0279, 0.0276, 0.0251, 0.0227], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 20:47:25,824 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=309767.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:47:34,899 INFO [finetune.py:992] (0/2) Epoch 18, batch 1800, loss[loss=0.1553, simple_loss=0.242, pruned_loss=0.03433, over 12295.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.2536, pruned_loss=0.03657, over 2372710.10 frames. ], batch size: 33, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:47:46,890 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.072e+02 2.723e+02 3.116e+02 3.628e+02 7.661e+02, threshold=6.232e+02, percent-clipped=3.0 2023-05-18 20:47:51,758 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.62 vs. limit=2.0 2023-05-18 20:48:10,639 INFO [finetune.py:992] (0/2) Epoch 18, batch 1850, loss[loss=0.1631, simple_loss=0.2597, pruned_loss=0.0333, over 12124.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.2534, pruned_loss=0.03641, over 2373806.68 frames. ], batch size: 38, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:48:12,351 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.0808, 4.3112, 3.8838, 4.6874, 4.2415, 2.7253, 3.9472, 2.8933], device='cuda:0'), covar=tensor([0.0902, 0.1079, 0.1479, 0.0663, 0.1383, 0.1949, 0.1256, 0.3763], device='cuda:0'), in_proj_covar=tensor([0.0319, 0.0388, 0.0371, 0.0341, 0.0382, 0.0284, 0.0358, 0.0379], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 20:48:21,915 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=309846.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:48:28,313 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=309855.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:48:45,480 INFO [finetune.py:992] (0/2) Epoch 18, batch 1900, loss[loss=0.1559, simple_loss=0.2562, pruned_loss=0.02781, over 12359.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.2537, pruned_loss=0.03657, over 2372566.82 frames. ], batch size: 35, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:48:57,078 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.14 vs. limit=2.0 2023-05-18 20:48:57,369 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.578e+02 3.098e+02 3.456e+02 8.293e+02, threshold=6.197e+02, percent-clipped=1.0 2023-05-18 20:49:04,602 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=309907.0, num_to_drop=1, layers_to_drop={2} 2023-05-18 20:49:10,868 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=309916.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:49:11,110 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.88 vs. limit=2.0 2023-05-18 20:49:19,603 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=309929.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:49:20,277 INFO [finetune.py:992] (0/2) Epoch 18, batch 1950, loss[loss=0.1547, simple_loss=0.248, pruned_loss=0.03071, over 12288.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2523, pruned_loss=0.03605, over 2372186.30 frames. ], batch size: 33, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:49:56,058 INFO [finetune.py:992] (0/2) Epoch 18, batch 2000, loss[loss=0.1553, simple_loss=0.243, pruned_loss=0.03381, over 12350.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.2519, pruned_loss=0.03582, over 2380834.40 frames. ], batch size: 31, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:49:56,932 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=309981.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:50:08,034 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.820e+02 2.535e+02 2.884e+02 3.403e+02 2.001e+03, threshold=5.769e+02, percent-clipped=2.0 2023-05-18 20:50:10,418 INFO [checkpoint.py:75] (0/2) Saving checkpoint to pruned_transducer_stateless7/exp_giga_finetune/checkpoint-210000.pt 2023-05-18 20:50:33,271 INFO [finetune.py:992] (0/2) Epoch 18, batch 2050, loss[loss=0.1842, simple_loss=0.2682, pruned_loss=0.05013, over 12158.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2523, pruned_loss=0.03629, over 2373075.80 frames. ], batch size: 34, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:50:41,762 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=310042.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:50:58,474 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=310067.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:51:02,280 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.89 vs. limit=5.0 2023-05-18 20:51:07,310 INFO [finetune.py:992] (0/2) Epoch 18, batch 2100, loss[loss=0.167, simple_loss=0.259, pruned_loss=0.0375, over 12153.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2523, pruned_loss=0.03643, over 2381171.33 frames. ], batch size: 36, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 20:51:15,883 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.50 vs. limit=2.0 2023-05-18 20:51:20,283 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.635e+02 3.197e+02 3.890e+02 6.389e+02, threshold=6.395e+02, percent-clipped=3.0 2023-05-18 20:51:32,962 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=310115.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:51:43,378 INFO [finetune.py:992] (0/2) Epoch 18, batch 2150, loss[loss=0.1508, simple_loss=0.2371, pruned_loss=0.03223, over 12261.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.2526, pruned_loss=0.03647, over 2380072.22 frames. ], batch size: 32, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 20:51:50,612 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.2136, 4.9047, 5.1441, 5.0734, 4.7453, 5.1726, 5.0233, 2.8683], device='cuda:0'), covar=tensor([0.0099, 0.0072, 0.0069, 0.0060, 0.0055, 0.0102, 0.0099, 0.0728], device='cuda:0'), in_proj_covar=tensor([0.0072, 0.0081, 0.0086, 0.0076, 0.0063, 0.0097, 0.0085, 0.0101], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 20:51:55,529 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.7192, 3.8044, 3.3090, 3.3155, 3.0912, 3.0203, 3.7949, 2.6301], device='cuda:0'), covar=tensor([0.0390, 0.0128, 0.0255, 0.0208, 0.0367, 0.0337, 0.0142, 0.0452], device='cuda:0'), in_proj_covar=tensor([0.0197, 0.0166, 0.0171, 0.0194, 0.0205, 0.0203, 0.0180, 0.0208], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 20:52:03,732 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.4641, 2.6609, 3.4101, 4.3874, 2.3638, 4.4056, 4.5597, 4.5870], device='cuda:0'), covar=tensor([0.0182, 0.1440, 0.0510, 0.0184, 0.1517, 0.0212, 0.0148, 0.0130], device='cuda:0'), in_proj_covar=tensor([0.0127, 0.0208, 0.0187, 0.0125, 0.0192, 0.0185, 0.0183, 0.0128], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 20:52:18,103 INFO [finetune.py:992] (0/2) Epoch 18, batch 2200, loss[loss=0.1778, simple_loss=0.2672, pruned_loss=0.0442, over 12129.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2516, pruned_loss=0.0363, over 2381960.55 frames. ], batch size: 38, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 20:52:19,257 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.38 vs. limit=2.0 2023-05-18 20:52:21,057 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.5121, 5.2510, 5.3914, 5.4598, 5.0844, 5.1705, 4.8688, 5.4016], device='cuda:0'), covar=tensor([0.0626, 0.0701, 0.0810, 0.0692, 0.2019, 0.1365, 0.0540, 0.1020], device='cuda:0'), in_proj_covar=tensor([0.0561, 0.0732, 0.0644, 0.0652, 0.0871, 0.0774, 0.0579, 0.0501], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0005, 0.0003, 0.0003], device='cuda:0') 2023-05-18 20:52:29,812 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.512e+02 2.961e+02 3.502e+02 5.832e+02, threshold=5.922e+02, percent-clipped=0.0 2023-05-18 20:52:33,718 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=310202.0, num_to_drop=1, layers_to_drop={2} 2023-05-18 20:52:39,861 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=310211.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:52:52,417 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=310229.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:52:53,011 INFO [finetune.py:992] (0/2) Epoch 18, batch 2250, loss[loss=0.1448, simple_loss=0.2286, pruned_loss=0.03049, over 12336.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.2516, pruned_loss=0.03613, over 2386013.35 frames. ], batch size: 31, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 20:53:20,068 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.2001, 4.7224, 4.1041, 4.9526, 4.4055, 2.8882, 4.1219, 3.0724], device='cuda:0'), covar=tensor([0.0939, 0.0714, 0.1623, 0.0508, 0.1201, 0.1880, 0.1209, 0.3512], device='cuda:0'), in_proj_covar=tensor([0.0314, 0.0382, 0.0365, 0.0336, 0.0374, 0.0280, 0.0350, 0.0371], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 20:53:26,749 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=310277.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:53:28,803 INFO [finetune.py:992] (0/2) Epoch 18, batch 2300, loss[loss=0.1571, simple_loss=0.2564, pruned_loss=0.02893, over 12287.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2524, pruned_loss=0.03618, over 2384860.54 frames. ], batch size: 37, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 20:53:33,841 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=310287.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:53:40,753 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.798e+02 2.561e+02 3.053e+02 3.561e+02 7.205e+02, threshold=6.106e+02, percent-clipped=2.0 2023-05-18 20:53:45,884 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.3798, 2.1200, 3.8408, 4.4036, 4.0110, 4.3624, 4.0159, 3.2423], device='cuda:0'), covar=tensor([0.0054, 0.0578, 0.0126, 0.0045, 0.0120, 0.0092, 0.0120, 0.0394], device='cuda:0'), in_proj_covar=tensor([0.0094, 0.0127, 0.0109, 0.0083, 0.0109, 0.0121, 0.0106, 0.0144], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 20:54:03,543 INFO [finetune.py:992] (0/2) Epoch 18, batch 2350, loss[loss=0.1703, simple_loss=0.2514, pruned_loss=0.04455, over 12299.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.2519, pruned_loss=0.03585, over 2388653.85 frames. ], batch size: 34, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 20:54:08,487 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=310337.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:54:16,271 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=310348.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:54:38,137 INFO [finetune.py:992] (0/2) Epoch 18, batch 2400, loss[loss=0.1747, simple_loss=0.2616, pruned_loss=0.04387, over 12287.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2516, pruned_loss=0.03562, over 2394830.90 frames. ], batch size: 37, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 20:54:45,506 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=2.01 vs. limit=2.0 2023-05-18 20:54:51,055 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.488e+02 2.575e+02 3.256e+02 3.752e+02 1.185e+03, threshold=6.512e+02, percent-clipped=4.0 2023-05-18 20:54:54,180 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.4129, 5.1990, 5.2983, 5.4275, 5.0238, 5.0528, 4.7646, 5.3196], device='cuda:0'), covar=tensor([0.0763, 0.0682, 0.0924, 0.0562, 0.2024, 0.1566, 0.0652, 0.1182], device='cuda:0'), in_proj_covar=tensor([0.0564, 0.0734, 0.0651, 0.0654, 0.0875, 0.0781, 0.0583, 0.0506], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0005, 0.0003, 0.0003], device='cuda:0') 2023-05-18 20:54:55,160 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2023-05-18 20:55:04,094 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.6338, 5.2416, 5.6137, 4.9652, 5.3085, 5.0708, 5.6628, 5.2044], device='cuda:0'), covar=tensor([0.0223, 0.0358, 0.0255, 0.0234, 0.0366, 0.0275, 0.0168, 0.0346], device='cuda:0'), in_proj_covar=tensor([0.0276, 0.0282, 0.0305, 0.0278, 0.0276, 0.0275, 0.0248, 0.0223], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 20:55:14,423 INFO [finetune.py:992] (0/2) Epoch 18, batch 2450, loss[loss=0.1583, simple_loss=0.259, pruned_loss=0.02883, over 12352.00 frames. ], tot_loss[loss=0.1604, simple_loss=0.2503, pruned_loss=0.03527, over 2394294.94 frames. ], batch size: 35, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 20:55:26,963 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.9946, 4.6598, 4.7252, 4.8925, 4.6522, 4.9469, 4.7919, 2.5933], device='cuda:0'), covar=tensor([0.0131, 0.0081, 0.0109, 0.0075, 0.0068, 0.0113, 0.0099, 0.0927], device='cuda:0'), in_proj_covar=tensor([0.0073, 0.0082, 0.0086, 0.0076, 0.0063, 0.0098, 0.0085, 0.0102], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 20:55:35,349 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.1102, 4.5043, 4.0212, 4.7667, 4.4308, 2.6760, 4.0337, 3.1334], device='cuda:0'), covar=tensor([0.0988, 0.0826, 0.1549, 0.0666, 0.1160, 0.2094, 0.1372, 0.3513], device='cuda:0'), in_proj_covar=tensor([0.0314, 0.0382, 0.0364, 0.0336, 0.0374, 0.0280, 0.0350, 0.0371], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 20:55:49,098 INFO [finetune.py:992] (0/2) Epoch 18, batch 2500, loss[loss=0.1593, simple_loss=0.2484, pruned_loss=0.03509, over 12280.00 frames. ], tot_loss[loss=0.1602, simple_loss=0.2501, pruned_loss=0.03512, over 2390286.43 frames. ], batch size: 33, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 20:55:50,372 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.94 vs. limit=5.0 2023-05-18 20:56:00,812 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.642e+02 2.609e+02 3.033e+02 3.813e+02 1.184e+03, threshold=6.066e+02, percent-clipped=3.0 2023-05-18 20:56:04,473 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=310502.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:56:10,841 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=310511.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:56:12,963 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.8291, 4.4594, 4.4897, 4.6812, 4.4872, 4.7624, 4.6594, 2.4727], device='cuda:0'), covar=tensor([0.0110, 0.0076, 0.0113, 0.0068, 0.0065, 0.0103, 0.0095, 0.0951], device='cuda:0'), in_proj_covar=tensor([0.0073, 0.0082, 0.0087, 0.0076, 0.0063, 0.0097, 0.0085, 0.0102], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 20:56:20,786 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.2733, 2.4862, 3.1689, 4.1494, 2.2134, 4.1726, 4.2296, 4.3036], device='cuda:0'), covar=tensor([0.0143, 0.1313, 0.0521, 0.0191, 0.1469, 0.0262, 0.0208, 0.0130], device='cuda:0'), in_proj_covar=tensor([0.0126, 0.0208, 0.0186, 0.0124, 0.0192, 0.0184, 0.0183, 0.0128], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 20:56:24,093 INFO [finetune.py:992] (0/2) Epoch 18, batch 2550, loss[loss=0.1625, simple_loss=0.2528, pruned_loss=0.03612, over 12077.00 frames. ], tot_loss[loss=0.1604, simple_loss=0.2504, pruned_loss=0.03517, over 2383409.34 frames. ], batch size: 40, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 20:56:27,764 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.4126, 5.1610, 5.2819, 5.3751, 4.9574, 4.9951, 4.7119, 5.3076], device='cuda:0'), covar=tensor([0.0673, 0.0678, 0.0991, 0.0585, 0.2091, 0.1617, 0.0614, 0.1014], device='cuda:0'), in_proj_covar=tensor([0.0565, 0.0733, 0.0650, 0.0654, 0.0877, 0.0783, 0.0583, 0.0503], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0005, 0.0003, 0.0003], device='cuda:0') 2023-05-18 20:56:39,239 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=310550.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:56:40,009 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0169, 4.6440, 4.6797, 4.9109, 4.7146, 4.9329, 4.8125, 2.7326], device='cuda:0'), covar=tensor([0.0084, 0.0071, 0.0102, 0.0058, 0.0052, 0.0098, 0.0080, 0.0852], device='cuda:0'), in_proj_covar=tensor([0.0072, 0.0082, 0.0086, 0.0076, 0.0063, 0.0097, 0.0085, 0.0102], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 20:56:45,443 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=310559.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:57:00,024 INFO [finetune.py:992] (0/2) Epoch 18, batch 2600, loss[loss=0.1566, simple_loss=0.2552, pruned_loss=0.02904, over 12110.00 frames. ], tot_loss[loss=0.1609, simple_loss=0.2513, pruned_loss=0.03529, over 2379770.56 frames. ], batch size: 33, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 20:57:11,842 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.750e+02 2.539e+02 2.969e+02 3.489e+02 1.170e+03, threshold=5.939e+02, percent-clipped=2.0 2023-05-18 20:57:35,061 INFO [finetune.py:992] (0/2) Epoch 18, batch 2650, loss[loss=0.1773, simple_loss=0.2663, pruned_loss=0.04414, over 12031.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.2517, pruned_loss=0.03528, over 2383508.90 frames. ], batch size: 42, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 20:57:40,038 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=310637.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:57:44,154 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=310643.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:57:48,506 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=310649.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:58:09,948 INFO [finetune.py:992] (0/2) Epoch 18, batch 2700, loss[loss=0.1624, simple_loss=0.2536, pruned_loss=0.03557, over 12182.00 frames. ], tot_loss[loss=0.1607, simple_loss=0.2511, pruned_loss=0.0352, over 2383206.45 frames. ], batch size: 35, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 20:58:14,148 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=310685.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:58:22,055 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.573e+02 2.755e+02 3.171e+02 3.711e+02 7.939e+02, threshold=6.342e+02, percent-clipped=1.0 2023-05-18 20:58:25,162 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.7518, 2.7800, 4.5356, 4.6109, 2.9010, 2.6325, 2.9504, 2.2518], device='cuda:0'), covar=tensor([0.1652, 0.3112, 0.0462, 0.0437, 0.1303, 0.2504, 0.2741, 0.4023], device='cuda:0'), in_proj_covar=tensor([0.0306, 0.0391, 0.0277, 0.0304, 0.0277, 0.0321, 0.0401, 0.0379], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-18 20:58:31,456 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=310710.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:58:45,107 INFO [finetune.py:992] (0/2) Epoch 18, batch 2750, loss[loss=0.1863, simple_loss=0.2787, pruned_loss=0.047, over 12274.00 frames. ], tot_loss[loss=0.1604, simple_loss=0.2506, pruned_loss=0.03516, over 2379032.04 frames. ], batch size: 37, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 20:59:11,895 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.2287, 4.6163, 2.8901, 2.4678, 3.9762, 2.5473, 3.9394, 3.1567], device='cuda:0'), covar=tensor([0.0705, 0.0612, 0.1114, 0.1544, 0.0340, 0.1328, 0.0490, 0.0817], device='cuda:0'), in_proj_covar=tensor([0.0190, 0.0259, 0.0180, 0.0201, 0.0143, 0.0185, 0.0203, 0.0177], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 20:59:15,836 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.9360, 4.7968, 4.7623, 4.8169, 4.4270, 4.9664, 4.9181, 5.1086], device='cuda:0'), covar=tensor([0.0274, 0.0171, 0.0197, 0.0403, 0.0846, 0.0301, 0.0146, 0.0205], device='cuda:0'), in_proj_covar=tensor([0.0205, 0.0204, 0.0198, 0.0255, 0.0251, 0.0231, 0.0184, 0.0238], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-18 20:59:19,898 INFO [finetune.py:992] (0/2) Epoch 18, batch 2800, loss[loss=0.1671, simple_loss=0.2545, pruned_loss=0.03981, over 12091.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2515, pruned_loss=0.03559, over 2380835.90 frames. ], batch size: 32, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 20:59:29,733 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.9203, 4.8085, 4.7612, 4.8134, 4.4414, 4.9292, 4.8934, 5.0992], device='cuda:0'), covar=tensor([0.0287, 0.0179, 0.0208, 0.0378, 0.0795, 0.0392, 0.0174, 0.0194], device='cuda:0'), in_proj_covar=tensor([0.0205, 0.0204, 0.0198, 0.0254, 0.0251, 0.0231, 0.0184, 0.0238], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-18 20:59:31,742 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.653e+02 3.113e+02 3.609e+02 5.905e+02, threshold=6.226e+02, percent-clipped=0.0 2023-05-18 20:59:34,326 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([6.0597, 6.0308, 5.8117, 5.2467, 5.2545, 5.9550, 5.5365, 5.3402], device='cuda:0'), covar=tensor([0.0793, 0.0985, 0.0776, 0.1873, 0.0734, 0.0770, 0.1661, 0.1247], device='cuda:0'), in_proj_covar=tensor([0.0650, 0.0583, 0.0535, 0.0658, 0.0436, 0.0754, 0.0811, 0.0586], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0004, 0.0002], device='cuda:0') 2023-05-18 20:59:36,494 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=310803.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:59:44,768 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.5996, 5.4076, 5.5236, 5.5859, 5.2046, 5.1702, 4.9604, 5.4601], device='cuda:0'), covar=tensor([0.0631, 0.0577, 0.0747, 0.0499, 0.1807, 0.1472, 0.0518, 0.1197], device='cuda:0'), in_proj_covar=tensor([0.0570, 0.0736, 0.0653, 0.0656, 0.0883, 0.0785, 0.0587, 0.0507], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0003, 0.0003], device='cuda:0') 2023-05-18 20:59:56,133 INFO [finetune.py:992] (0/2) Epoch 18, batch 2850, loss[loss=0.1958, simple_loss=0.2808, pruned_loss=0.05543, over 8007.00 frames. ], tot_loss[loss=0.1605, simple_loss=0.2502, pruned_loss=0.03543, over 2378393.43 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 21:00:10,349 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.4544, 3.5438, 3.2030, 3.1666, 2.8636, 2.7805, 3.5327, 2.1302], device='cuda:0'), covar=tensor([0.0473, 0.0176, 0.0218, 0.0233, 0.0474, 0.0419, 0.0150, 0.0653], device='cuda:0'), in_proj_covar=tensor([0.0198, 0.0167, 0.0171, 0.0195, 0.0206, 0.0205, 0.0181, 0.0208], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 21:00:19,913 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=310864.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:00:30,600 INFO [finetune.py:992] (0/2) Epoch 18, batch 2900, loss[loss=0.157, simple_loss=0.2557, pruned_loss=0.02913, over 12271.00 frames. ], tot_loss[loss=0.1613, simple_loss=0.2512, pruned_loss=0.0357, over 2369321.68 frames. ], batch size: 33, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 21:00:42,443 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.955e+02 2.591e+02 3.025e+02 3.364e+02 5.558e+02, threshold=6.049e+02, percent-clipped=0.0 2023-05-18 21:01:05,199 INFO [finetune.py:992] (0/2) Epoch 18, batch 2950, loss[loss=0.1619, simple_loss=0.2573, pruned_loss=0.03326, over 12093.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.2507, pruned_loss=0.03568, over 2372058.86 frames. ], batch size: 32, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 21:01:14,370 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=310943.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:01:19,452 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.7877, 2.9301, 4.7840, 4.8807, 2.9522, 2.7075, 3.0449, 2.3016], device='cuda:0'), covar=tensor([0.1749, 0.3230, 0.0393, 0.0410, 0.1392, 0.2621, 0.2929, 0.4124], device='cuda:0'), in_proj_covar=tensor([0.0309, 0.0396, 0.0280, 0.0307, 0.0280, 0.0324, 0.0406, 0.0383], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-18 21:01:40,648 INFO [finetune.py:992] (0/2) Epoch 18, batch 3000, loss[loss=0.1582, simple_loss=0.2499, pruned_loss=0.03328, over 12093.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.2512, pruned_loss=0.03602, over 2374317.22 frames. ], batch size: 32, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 21:01:40,648 INFO [finetune.py:1017] (0/2) Computing validation loss 2023-05-18 21:01:47,839 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.1318, 3.1650, 3.0822, 2.8652, 2.5663, 2.5524, 3.0915, 1.9979], device='cuda:0'), covar=tensor([0.0452, 0.0190, 0.0151, 0.0269, 0.0396, 0.0341, 0.0179, 0.0546], device='cuda:0'), in_proj_covar=tensor([0.0199, 0.0169, 0.0173, 0.0197, 0.0208, 0.0206, 0.0182, 0.0209], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 21:01:58,644 INFO [finetune.py:1026] (0/2) Epoch 18, validation: loss=0.3133, simple_loss=0.3898, pruned_loss=0.1184, over 1020973.00 frames. 2023-05-18 21:01:58,644 INFO [finetune.py:1027] (0/2) Maximum memory allocated so far is 12508MB 2023-05-18 21:02:06,228 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=310991.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:02:10,464 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.845e+02 2.831e+02 3.206e+02 3.892e+02 6.977e+02, threshold=6.412e+02, percent-clipped=3.0 2023-05-18 21:02:16,462 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=311005.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:02:19,660 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=3.24 vs. limit=5.0 2023-05-18 21:02:32,474 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=311028.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:02:33,750 INFO [finetune.py:992] (0/2) Epoch 18, batch 3050, loss[loss=0.1488, simple_loss=0.248, pruned_loss=0.02481, over 12153.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.2515, pruned_loss=0.03595, over 2376680.44 frames. ], batch size: 34, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 21:02:49,741 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.5629, 4.0979, 4.2840, 4.6059, 3.2097, 4.0381, 2.8632, 4.2378], device='cuda:0'), covar=tensor([0.1373, 0.0747, 0.0890, 0.0603, 0.1068, 0.0547, 0.1593, 0.1159], device='cuda:0'), in_proj_covar=tensor([0.0232, 0.0273, 0.0302, 0.0362, 0.0247, 0.0246, 0.0264, 0.0374], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-18 21:03:09,061 INFO [finetune.py:992] (0/2) Epoch 18, batch 3100, loss[loss=0.1611, simple_loss=0.2494, pruned_loss=0.03636, over 12288.00 frames. ], tot_loss[loss=0.1615, simple_loss=0.2513, pruned_loss=0.03591, over 2378825.53 frames. ], batch size: 33, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 21:03:16,005 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=311089.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:03:16,846 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.19 vs. limit=2.0 2023-05-18 21:03:21,386 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.869e+02 2.622e+02 2.881e+02 3.500e+02 8.081e+02, threshold=5.763e+02, percent-clipped=2.0 2023-05-18 21:03:38,270 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.7843, 2.3056, 3.3273, 3.8149, 3.4796, 3.7542, 3.4117, 2.6538], device='cuda:0'), covar=tensor([0.0064, 0.0441, 0.0159, 0.0051, 0.0139, 0.0111, 0.0151, 0.0435], device='cuda:0'), in_proj_covar=tensor([0.0095, 0.0128, 0.0110, 0.0083, 0.0111, 0.0122, 0.0108, 0.0145], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 21:03:44,194 INFO [finetune.py:992] (0/2) Epoch 18, batch 3150, loss[loss=0.1886, simple_loss=0.2774, pruned_loss=0.04986, over 12095.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.252, pruned_loss=0.03617, over 2375632.63 frames. ], batch size: 42, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 21:03:50,417 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.1839, 4.7238, 5.1845, 4.4435, 4.8069, 4.6309, 5.2255, 4.9141], device='cuda:0'), covar=tensor([0.0322, 0.0490, 0.0287, 0.0313, 0.0485, 0.0346, 0.0214, 0.0362], device='cuda:0'), in_proj_covar=tensor([0.0281, 0.0286, 0.0309, 0.0281, 0.0280, 0.0279, 0.0252, 0.0226], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 21:04:04,467 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=311159.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:04:18,871 INFO [finetune.py:992] (0/2) Epoch 18, batch 3200, loss[loss=0.1601, simple_loss=0.2572, pruned_loss=0.03148, over 11253.00 frames. ], tot_loss[loss=0.1615, simple_loss=0.2515, pruned_loss=0.03578, over 2372459.57 frames. ], batch size: 55, lr: 3.25e-03, grad_scale: 8.0 2023-05-18 21:04:31,363 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.730e+02 2.620e+02 3.041e+02 3.528e+02 9.797e+02, threshold=6.082e+02, percent-clipped=4.0 2023-05-18 21:04:40,193 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.19 vs. limit=2.0 2023-05-18 21:04:54,805 INFO [finetune.py:992] (0/2) Epoch 18, batch 3250, loss[loss=0.1536, simple_loss=0.2467, pruned_loss=0.03025, over 12309.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.2515, pruned_loss=0.03592, over 2376581.24 frames. ], batch size: 34, lr: 3.25e-03, grad_scale: 8.0 2023-05-18 21:05:23,531 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.4253, 2.4562, 3.6755, 4.4286, 3.8711, 4.4212, 3.8470, 3.1356], device='cuda:0'), covar=tensor([0.0047, 0.0470, 0.0158, 0.0047, 0.0155, 0.0078, 0.0147, 0.0414], device='cuda:0'), in_proj_covar=tensor([0.0095, 0.0128, 0.0110, 0.0083, 0.0110, 0.0122, 0.0107, 0.0145], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 21:05:29,455 INFO [finetune.py:992] (0/2) Epoch 18, batch 3300, loss[loss=0.1795, simple_loss=0.2704, pruned_loss=0.04427, over 12129.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.2519, pruned_loss=0.036, over 2378773.40 frames. ], batch size: 39, lr: 3.25e-03, grad_scale: 8.0 2023-05-18 21:05:42,077 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.810e+02 2.684e+02 3.108e+02 3.750e+02 5.472e+02, threshold=6.215e+02, percent-clipped=0.0 2023-05-18 21:05:47,208 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=311305.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:06:00,029 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.0732, 4.5986, 3.9377, 4.8502, 4.3405, 2.9488, 4.0978, 2.8334], device='cuda:0'), covar=tensor([0.1026, 0.0764, 0.1551, 0.0627, 0.1317, 0.1815, 0.1303, 0.3923], device='cuda:0'), in_proj_covar=tensor([0.0317, 0.0387, 0.0368, 0.0343, 0.0380, 0.0283, 0.0353, 0.0375], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 21:06:04,696 INFO [finetune.py:992] (0/2) Epoch 18, batch 3350, loss[loss=0.1833, simple_loss=0.2676, pruned_loss=0.04944, over 12060.00 frames. ], tot_loss[loss=0.162, simple_loss=0.2519, pruned_loss=0.03607, over 2375876.62 frames. ], batch size: 37, lr: 3.25e-03, grad_scale: 8.0 2023-05-18 21:06:06,338 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=311332.0, num_to_drop=1, layers_to_drop={0} 2023-05-18 21:06:20,374 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=311353.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:06:38,445 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.2171, 4.5431, 2.7516, 2.4574, 3.8999, 2.4395, 3.8224, 3.1196], device='cuda:0'), covar=tensor([0.0794, 0.0486, 0.1289, 0.1582, 0.0383, 0.1425, 0.0558, 0.0901], device='cuda:0'), in_proj_covar=tensor([0.0190, 0.0261, 0.0179, 0.0202, 0.0144, 0.0186, 0.0204, 0.0177], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 21:06:40,388 INFO [finetune.py:992] (0/2) Epoch 18, batch 3400, loss[loss=0.1518, simple_loss=0.2451, pruned_loss=0.0293, over 12193.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2524, pruned_loss=0.03648, over 2374214.21 frames. ], batch size: 35, lr: 3.25e-03, grad_scale: 8.0 2023-05-18 21:06:43,302 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=311384.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:06:49,600 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=311393.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 21:06:52,741 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.523e+02 3.046e+02 3.559e+02 6.317e+02, threshold=6.092e+02, percent-clipped=2.0 2023-05-18 21:06:59,520 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.0777, 2.4748, 3.6595, 2.9856, 3.4028, 3.1412, 2.5604, 3.5321], device='cuda:0'), covar=tensor([0.0176, 0.0438, 0.0197, 0.0295, 0.0196, 0.0224, 0.0436, 0.0158], device='cuda:0'), in_proj_covar=tensor([0.0191, 0.0217, 0.0205, 0.0200, 0.0231, 0.0179, 0.0211, 0.0202], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 21:07:08,246 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.2056, 2.5654, 3.7473, 3.0839, 3.4931, 3.2840, 2.6914, 3.6116], device='cuda:0'), covar=tensor([0.0130, 0.0421, 0.0176, 0.0261, 0.0188, 0.0190, 0.0393, 0.0138], device='cuda:0'), in_proj_covar=tensor([0.0191, 0.0216, 0.0204, 0.0200, 0.0231, 0.0178, 0.0210, 0.0202], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 21:07:15,175 INFO [finetune.py:992] (0/2) Epoch 18, batch 3450, loss[loss=0.1633, simple_loss=0.2556, pruned_loss=0.03553, over 12365.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2526, pruned_loss=0.03668, over 2371616.25 frames. ], batch size: 36, lr: 3.25e-03, grad_scale: 8.0 2023-05-18 21:07:35,825 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=311459.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:07:36,973 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.41 vs. limit=2.0 2023-05-18 21:07:50,026 INFO [finetune.py:992] (0/2) Epoch 18, batch 3500, loss[loss=0.1659, simple_loss=0.263, pruned_loss=0.03444, over 12354.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.252, pruned_loss=0.03636, over 2381840.05 frames. ], batch size: 36, lr: 3.25e-03, grad_scale: 8.0 2023-05-18 21:08:02,424 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.944e+02 2.630e+02 3.039e+02 3.536e+02 5.562e+02, threshold=6.078e+02, percent-clipped=0.0 2023-05-18 21:08:08,702 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=311507.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:08:25,650 INFO [finetune.py:992] (0/2) Epoch 18, batch 3550, loss[loss=0.1383, simple_loss=0.2245, pruned_loss=0.02606, over 12254.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2518, pruned_loss=0.03627, over 2377868.36 frames. ], batch size: 32, lr: 3.25e-03, grad_scale: 8.0 2023-05-18 21:08:51,229 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=311567.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:09:00,002 INFO [finetune.py:992] (0/2) Epoch 18, batch 3600, loss[loss=0.1657, simple_loss=0.2605, pruned_loss=0.03544, over 12063.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.252, pruned_loss=0.03636, over 2371239.12 frames. ], batch size: 42, lr: 3.25e-03, grad_scale: 8.0 2023-05-18 21:09:12,577 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.992e+02 2.695e+02 3.162e+02 3.738e+02 6.009e+02, threshold=6.324e+02, percent-clipped=0.0 2023-05-18 21:09:14,723 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.33 vs. limit=2.0 2023-05-18 21:09:19,390 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.7916, 3.5457, 5.2329, 2.7649, 2.9465, 3.8132, 3.2516, 3.9423], device='cuda:0'), covar=tensor([0.0421, 0.1089, 0.0282, 0.1222, 0.1937, 0.1486, 0.1381, 0.1094], device='cuda:0'), in_proj_covar=tensor([0.0243, 0.0242, 0.0264, 0.0190, 0.0245, 0.0302, 0.0232, 0.0276], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 21:09:34,245 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=311628.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:09:35,385 INFO [finetune.py:992] (0/2) Epoch 18, batch 3650, loss[loss=0.1623, simple_loss=0.249, pruned_loss=0.03782, over 12293.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2522, pruned_loss=0.03649, over 2370752.83 frames. ], batch size: 33, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:10:11,026 INFO [finetune.py:992] (0/2) Epoch 18, batch 3700, loss[loss=0.1861, simple_loss=0.2756, pruned_loss=0.04829, over 12349.00 frames. ], tot_loss[loss=0.1633, simple_loss=0.253, pruned_loss=0.03678, over 2361875.42 frames. ], batch size: 35, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:10:13,803 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=311684.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:10:14,623 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=311685.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:10:16,491 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=311688.0, num_to_drop=1, layers_to_drop={2} 2023-05-18 21:10:18,089 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=2.64 vs. limit=5.0 2023-05-18 21:10:23,220 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.626e+02 3.034e+02 3.864e+02 3.281e+03, threshold=6.068e+02, percent-clipped=3.0 2023-05-18 21:10:43,050 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=311726.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:10:45,589 INFO [finetune.py:992] (0/2) Epoch 18, batch 3750, loss[loss=0.1435, simple_loss=0.2359, pruned_loss=0.02551, over 12162.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2523, pruned_loss=0.03649, over 2364250.97 frames. ], batch size: 36, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:10:46,955 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=311732.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:10:56,726 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=311746.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:10:59,022 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2023-05-18 21:11:13,218 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.8296, 2.9529, 4.4049, 4.6634, 2.8526, 2.6999, 3.0256, 2.2444], device='cuda:0'), covar=tensor([0.1835, 0.3130, 0.0595, 0.0495, 0.1528, 0.2775, 0.2773, 0.4291], device='cuda:0'), in_proj_covar=tensor([0.0313, 0.0401, 0.0283, 0.0310, 0.0284, 0.0327, 0.0411, 0.0387], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-18 21:11:19,890 INFO [finetune.py:992] (0/2) Epoch 18, batch 3800, loss[loss=0.1823, simple_loss=0.277, pruned_loss=0.04378, over 12356.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2523, pruned_loss=0.03649, over 2370701.97 frames. ], batch size: 38, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:11:25,095 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=311787.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:11:25,858 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.6489, 3.2866, 5.1337, 2.6823, 2.8340, 3.7882, 3.1731, 3.9298], device='cuda:0'), covar=tensor([0.0504, 0.1265, 0.0376, 0.1302, 0.2105, 0.1631, 0.1550, 0.1247], device='cuda:0'), in_proj_covar=tensor([0.0244, 0.0243, 0.0266, 0.0190, 0.0246, 0.0304, 0.0233, 0.0278], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 21:11:32,675 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.939e+02 2.626e+02 3.008e+02 3.427e+02 6.541e+02, threshold=6.016e+02, percent-clipped=1.0 2023-05-18 21:11:49,761 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.6356, 4.3901, 4.3407, 4.5513, 4.3459, 4.5265, 4.4749, 2.5679], device='cuda:0'), covar=tensor([0.0115, 0.0080, 0.0122, 0.0074, 0.0062, 0.0115, 0.0102, 0.0884], device='cuda:0'), in_proj_covar=tensor([0.0073, 0.0083, 0.0087, 0.0077, 0.0064, 0.0098, 0.0086, 0.0103], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 21:11:56,547 INFO [finetune.py:992] (0/2) Epoch 18, batch 3850, loss[loss=0.2045, simple_loss=0.2905, pruned_loss=0.05928, over 12123.00 frames. ], tot_loss[loss=0.1636, simple_loss=0.2534, pruned_loss=0.03689, over 2370482.52 frames. ], batch size: 38, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:12:13,644 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=311854.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:12:23,133 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.26 vs. limit=2.0 2023-05-18 21:12:31,638 INFO [finetune.py:992] (0/2) Epoch 18, batch 3900, loss[loss=0.1986, simple_loss=0.2852, pruned_loss=0.05598, over 12366.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2524, pruned_loss=0.03667, over 2362957.19 frames. ], batch size: 38, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:12:43,594 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0929, 3.7390, 5.4727, 3.1182, 3.1480, 4.0625, 3.5215, 4.1463], device='cuda:0'), covar=tensor([0.0374, 0.0986, 0.0280, 0.1088, 0.1953, 0.1572, 0.1282, 0.1019], device='cuda:0'), in_proj_covar=tensor([0.0244, 0.0242, 0.0265, 0.0189, 0.0246, 0.0303, 0.0232, 0.0277], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 21:12:44,050 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.753e+02 2.717e+02 3.082e+02 3.749e+02 5.734e+02, threshold=6.165e+02, percent-clipped=0.0 2023-05-18 21:12:56,222 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=311915.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:13:01,641 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=311923.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:13:06,271 INFO [finetune.py:992] (0/2) Epoch 18, batch 3950, loss[loss=0.1763, simple_loss=0.2701, pruned_loss=0.04128, over 12359.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2522, pruned_loss=0.03662, over 2363168.66 frames. ], batch size: 36, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:13:41,829 INFO [finetune.py:992] (0/2) Epoch 18, batch 4000, loss[loss=0.1544, simple_loss=0.2451, pruned_loss=0.03189, over 12298.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2519, pruned_loss=0.03632, over 2372547.73 frames. ], batch size: 33, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:13:47,350 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=311988.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 21:13:54,049 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.164e+02 2.734e+02 3.153e+02 3.912e+02 1.026e+03, threshold=6.306e+02, percent-clipped=4.0 2023-05-18 21:13:55,820 INFO [checkpoint.py:75] (0/2) Saving checkpoint to pruned_transducer_stateless7/exp_giga_finetune/checkpoint-212000.pt 2023-05-18 21:14:10,749 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.0134, 4.4608, 3.9289, 4.7257, 4.3034, 2.8667, 4.0466, 2.9081], device='cuda:0'), covar=tensor([0.1019, 0.0850, 0.1513, 0.0626, 0.1244, 0.1842, 0.1153, 0.3627], device='cuda:0'), in_proj_covar=tensor([0.0320, 0.0392, 0.0370, 0.0345, 0.0383, 0.0285, 0.0357, 0.0377], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 21:14:19,525 INFO [finetune.py:992] (0/2) Epoch 18, batch 4050, loss[loss=0.1514, simple_loss=0.2488, pruned_loss=0.027, over 12366.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2522, pruned_loss=0.03663, over 2365587.41 frames. ], batch size: 35, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:14:23,841 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=312036.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 21:14:27,424 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=312041.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:14:35,478 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.34 vs. limit=2.0 2023-05-18 21:14:38,598 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.3223, 4.5715, 2.8987, 2.6146, 3.9142, 2.6988, 3.8867, 3.4179], device='cuda:0'), covar=tensor([0.0704, 0.0694, 0.1094, 0.1667, 0.0374, 0.1304, 0.0534, 0.0717], device='cuda:0'), in_proj_covar=tensor([0.0190, 0.0261, 0.0179, 0.0201, 0.0143, 0.0185, 0.0203, 0.0176], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 21:14:54,366 INFO [finetune.py:992] (0/2) Epoch 18, batch 4100, loss[loss=0.1744, simple_loss=0.2663, pruned_loss=0.04122, over 12034.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.2526, pruned_loss=0.03677, over 2364495.66 frames. ], batch size: 40, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:14:55,834 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=312082.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:15:06,866 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.043e+02 2.580e+02 3.050e+02 3.639e+02 7.290e+02, threshold=6.100e+02, percent-clipped=1.0 2023-05-18 21:15:10,418 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.2322, 4.7510, 5.2039, 4.5137, 4.8290, 4.6458, 5.2326, 4.9171], device='cuda:0'), covar=tensor([0.0305, 0.0496, 0.0334, 0.0301, 0.0508, 0.0336, 0.0265, 0.0350], device='cuda:0'), in_proj_covar=tensor([0.0280, 0.0285, 0.0308, 0.0279, 0.0280, 0.0276, 0.0251, 0.0225], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 21:15:17,242 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.1800, 4.7983, 4.9722, 4.9865, 4.9417, 5.0544, 4.9524, 2.7245], device='cuda:0'), covar=tensor([0.0087, 0.0075, 0.0086, 0.0062, 0.0046, 0.0104, 0.0079, 0.0832], device='cuda:0'), in_proj_covar=tensor([0.0073, 0.0083, 0.0087, 0.0077, 0.0063, 0.0098, 0.0086, 0.0103], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 21:15:30,303 INFO [finetune.py:992] (0/2) Epoch 18, batch 4150, loss[loss=0.1287, simple_loss=0.2112, pruned_loss=0.02312, over 12163.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.252, pruned_loss=0.03634, over 2364227.67 frames. ], batch size: 29, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:15:57,531 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.50 vs. limit=2.0 2023-05-18 21:16:04,692 INFO [finetune.py:992] (0/2) Epoch 18, batch 4200, loss[loss=0.1875, simple_loss=0.2779, pruned_loss=0.04856, over 12125.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2518, pruned_loss=0.03615, over 2374531.80 frames. ], batch size: 39, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:16:17,282 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.655e+02 3.216e+02 3.929e+02 7.949e+02, threshold=6.433e+02, percent-clipped=1.0 2023-05-18 21:16:25,918 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=312210.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:16:34,868 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=312223.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:16:39,708 INFO [finetune.py:992] (0/2) Epoch 18, batch 4250, loss[loss=0.152, simple_loss=0.242, pruned_loss=0.03098, over 12095.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2529, pruned_loss=0.03648, over 2364244.92 frames. ], batch size: 32, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:16:47,185 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.8584, 4.5846, 4.1846, 4.1875, 4.6778, 4.0211, 4.2195, 3.9270], device='cuda:0'), covar=tensor([0.1778, 0.1150, 0.1353, 0.2040, 0.1175, 0.2451, 0.1996, 0.1838], device='cuda:0'), in_proj_covar=tensor([0.0375, 0.0525, 0.0419, 0.0468, 0.0482, 0.0462, 0.0422, 0.0402], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 21:17:09,254 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=312271.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:17:15,231 INFO [finetune.py:992] (0/2) Epoch 18, batch 4300, loss[loss=0.1587, simple_loss=0.2495, pruned_loss=0.03398, over 12290.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2531, pruned_loss=0.03636, over 2359383.97 frames. ], batch size: 33, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:17:27,616 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.884e+02 2.814e+02 3.252e+02 3.764e+02 1.098e+03, threshold=6.505e+02, percent-clipped=2.0 2023-05-18 21:17:49,339 INFO [finetune.py:992] (0/2) Epoch 18, batch 4350, loss[loss=0.1669, simple_loss=0.2569, pruned_loss=0.03841, over 12357.00 frames. ], tot_loss[loss=0.1636, simple_loss=0.2536, pruned_loss=0.03679, over 2358245.45 frames. ], batch size: 36, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:17:50,245 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.1481, 5.0231, 5.0064, 5.0310, 4.6776, 5.0851, 5.0605, 5.2941], device='cuda:0'), covar=tensor([0.0191, 0.0157, 0.0165, 0.0362, 0.0695, 0.0333, 0.0150, 0.0169], device='cuda:0'), in_proj_covar=tensor([0.0202, 0.0203, 0.0196, 0.0253, 0.0246, 0.0228, 0.0182, 0.0237], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-18 21:17:57,320 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=312341.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:18:24,906 INFO [finetune.py:992] (0/2) Epoch 18, batch 4400, loss[loss=0.1562, simple_loss=0.2489, pruned_loss=0.03178, over 12189.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.2529, pruned_loss=0.03633, over 2361812.41 frames. ], batch size: 35, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:18:26,431 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=312382.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:18:31,783 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=312389.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:18:37,864 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.785e+02 2.591e+02 3.116e+02 3.636e+02 1.408e+03, threshold=6.232e+02, percent-clipped=3.0 2023-05-18 21:18:38,797 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.7882, 2.6080, 4.8396, 5.1487, 3.3527, 2.6011, 2.9960, 2.1298], device='cuda:0'), covar=tensor([0.1849, 0.3816, 0.0436, 0.0310, 0.1128, 0.2757, 0.3381, 0.5586], device='cuda:0'), in_proj_covar=tensor([0.0313, 0.0400, 0.0285, 0.0310, 0.0284, 0.0328, 0.0411, 0.0387], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-18 21:19:00,508 INFO [finetune.py:992] (0/2) Epoch 18, batch 4450, loss[loss=0.1425, simple_loss=0.2292, pruned_loss=0.02789, over 11391.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2529, pruned_loss=0.03641, over 2359368.43 frames. ], batch size: 25, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:19:00,570 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=312430.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:19:32,130 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.11 vs. limit=2.0 2023-05-18 21:19:35,198 INFO [finetune.py:992] (0/2) Epoch 18, batch 4500, loss[loss=0.1441, simple_loss=0.2293, pruned_loss=0.02938, over 12097.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.2527, pruned_loss=0.03648, over 2361758.32 frames. ], batch size: 32, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:19:38,227 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.2643, 3.9620, 3.9848, 4.2599, 2.9944, 3.9471, 2.7255, 4.0312], device='cuda:0'), covar=tensor([0.1572, 0.0754, 0.0837, 0.0650, 0.1143, 0.0598, 0.1708, 0.0896], device='cuda:0'), in_proj_covar=tensor([0.0235, 0.0277, 0.0307, 0.0368, 0.0250, 0.0251, 0.0268, 0.0380], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-18 21:19:43,090 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.4725, 4.9422, 4.2438, 5.0621, 4.5617, 3.1487, 4.3302, 3.2362], device='cuda:0'), covar=tensor([0.0797, 0.0672, 0.1386, 0.0523, 0.1198, 0.1597, 0.1031, 0.3173], device='cuda:0'), in_proj_covar=tensor([0.0316, 0.0386, 0.0365, 0.0341, 0.0378, 0.0282, 0.0354, 0.0372], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 21:19:47,746 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.034e+02 2.715e+02 3.190e+02 4.026e+02 2.285e+03, threshold=6.379e+02, percent-clipped=2.0 2023-05-18 21:19:56,198 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=312510.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:20:10,553 INFO [finetune.py:992] (0/2) Epoch 18, batch 4550, loss[loss=0.1729, simple_loss=0.2642, pruned_loss=0.04079, over 12114.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.2529, pruned_loss=0.03638, over 2368867.38 frames. ], batch size: 38, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:20:30,597 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=312558.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:20:41,729 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=312574.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:20:45,719 INFO [finetune.py:992] (0/2) Epoch 18, batch 4600, loss[loss=0.1513, simple_loss=0.2412, pruned_loss=0.03069, over 12178.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2525, pruned_loss=0.03619, over 2370321.38 frames. ], batch size: 31, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:20:48,737 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.6821, 2.7747, 4.4693, 4.5741, 2.8503, 2.5528, 2.9150, 2.1014], device='cuda:0'), covar=tensor([0.1757, 0.3412, 0.0510, 0.0481, 0.1470, 0.2670, 0.3067, 0.4416], device='cuda:0'), in_proj_covar=tensor([0.0310, 0.0397, 0.0283, 0.0308, 0.0282, 0.0325, 0.0408, 0.0384], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-18 21:20:58,050 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.688e+02 3.108e+02 3.848e+02 5.963e+02, threshold=6.216e+02, percent-clipped=0.0 2023-05-18 21:21:20,503 INFO [finetune.py:992] (0/2) Epoch 18, batch 4650, loss[loss=0.1604, simple_loss=0.2443, pruned_loss=0.03821, over 12162.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.253, pruned_loss=0.03644, over 2372192.59 frames. ], batch size: 31, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:21:21,900 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.5792, 5.0741, 5.5498, 4.8846, 5.1640, 5.0150, 5.5720, 5.0822], device='cuda:0'), covar=tensor([0.0265, 0.0429, 0.0275, 0.0277, 0.0448, 0.0339, 0.0192, 0.0347], device='cuda:0'), in_proj_covar=tensor([0.0283, 0.0289, 0.0311, 0.0282, 0.0283, 0.0279, 0.0254, 0.0228], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 21:21:24,024 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=312635.0, num_to_drop=1, layers_to_drop={3} 2023-05-18 21:21:26,106 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=312638.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:21:49,080 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.5000, 2.5311, 3.2190, 4.2946, 2.3397, 4.2858, 4.4997, 4.5569], device='cuda:0'), covar=tensor([0.0136, 0.1322, 0.0500, 0.0175, 0.1421, 0.0286, 0.0140, 0.0104], device='cuda:0'), in_proj_covar=tensor([0.0127, 0.0208, 0.0186, 0.0126, 0.0192, 0.0186, 0.0184, 0.0129], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 21:21:56,213 INFO [finetune.py:992] (0/2) Epoch 18, batch 4700, loss[loss=0.145, simple_loss=0.2407, pruned_loss=0.02461, over 12283.00 frames. ], tot_loss[loss=0.1636, simple_loss=0.2537, pruned_loss=0.03677, over 2376557.06 frames. ], batch size: 33, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:22:08,466 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.813e+02 2.540e+02 2.971e+02 3.808e+02 6.706e+02, threshold=5.942e+02, percent-clipped=2.0 2023-05-18 21:22:09,397 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=312699.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:22:30,885 INFO [finetune.py:992] (0/2) Epoch 18, batch 4750, loss[loss=0.1454, simple_loss=0.2433, pruned_loss=0.0238, over 12200.00 frames. ], tot_loss[loss=0.1633, simple_loss=0.2533, pruned_loss=0.03662, over 2377547.58 frames. ], batch size: 31, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:23:05,276 INFO [finetune.py:992] (0/2) Epoch 18, batch 4800, loss[loss=0.1771, simple_loss=0.2743, pruned_loss=0.03992, over 12287.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2526, pruned_loss=0.03674, over 2373351.77 frames. ], batch size: 37, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:23:17,508 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.937e+02 2.574e+02 2.999e+02 3.689e+02 9.471e+02, threshold=5.998e+02, percent-clipped=4.0 2023-05-18 21:23:26,888 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.78 vs. limit=2.0 2023-05-18 21:23:40,815 INFO [finetune.py:992] (0/2) Epoch 18, batch 4850, loss[loss=0.1727, simple_loss=0.2577, pruned_loss=0.0438, over 8008.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.252, pruned_loss=0.03645, over 2369322.50 frames. ], batch size: 98, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:24:15,431 INFO [finetune.py:992] (0/2) Epoch 18, batch 4900, loss[loss=0.1417, simple_loss=0.2255, pruned_loss=0.02893, over 12188.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2526, pruned_loss=0.03661, over 2367240.69 frames. ], batch size: 29, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:24:18,374 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.8878, 4.2541, 3.7132, 4.4295, 4.0199, 2.7493, 3.8346, 2.8998], device='cuda:0'), covar=tensor([0.0942, 0.0842, 0.1460, 0.0725, 0.1357, 0.1853, 0.1196, 0.3423], device='cuda:0'), in_proj_covar=tensor([0.0316, 0.0387, 0.0368, 0.0344, 0.0380, 0.0284, 0.0355, 0.0374], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 21:24:27,428 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.2159, 2.6012, 3.7760, 3.1255, 3.6304, 3.3274, 2.7201, 3.6698], device='cuda:0'), covar=tensor([0.0154, 0.0377, 0.0182, 0.0275, 0.0157, 0.0192, 0.0390, 0.0154], device='cuda:0'), in_proj_covar=tensor([0.0195, 0.0221, 0.0208, 0.0203, 0.0236, 0.0181, 0.0214, 0.0205], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 21:24:27,889 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 2.600e+02 3.047e+02 3.560e+02 6.137e+02, threshold=6.093e+02, percent-clipped=1.0 2023-05-18 21:24:50,413 INFO [finetune.py:992] (0/2) Epoch 18, batch 4950, loss[loss=0.1677, simple_loss=0.2626, pruned_loss=0.03644, over 12307.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.2513, pruned_loss=0.03603, over 2372444.27 frames. ], batch size: 34, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:24:50,482 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=312930.0, num_to_drop=1, layers_to_drop={3} 2023-05-18 21:25:26,162 INFO [finetune.py:992] (0/2) Epoch 18, batch 5000, loss[loss=0.1467, simple_loss=0.2302, pruned_loss=0.0316, over 12349.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.252, pruned_loss=0.03622, over 2356069.42 frames. ], batch size: 30, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:25:35,980 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=312994.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:25:38,720 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.800e+02 2.662e+02 3.140e+02 3.645e+02 1.191e+03, threshold=6.280e+02, percent-clipped=2.0 2023-05-18 21:25:38,942 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=312998.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:25:48,486 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.8202, 2.9776, 4.8374, 4.8946, 2.9887, 2.7596, 2.9818, 2.4744], device='cuda:0'), covar=tensor([0.1699, 0.3035, 0.0416, 0.0450, 0.1333, 0.2513, 0.2989, 0.3898], device='cuda:0'), in_proj_covar=tensor([0.0311, 0.0397, 0.0283, 0.0309, 0.0281, 0.0325, 0.0408, 0.0385], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-18 21:26:01,188 INFO [finetune.py:992] (0/2) Epoch 18, batch 5050, loss[loss=0.1521, simple_loss=0.2448, pruned_loss=0.02969, over 12359.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2511, pruned_loss=0.03588, over 2365353.10 frames. ], batch size: 36, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:26:05,676 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.13 vs. limit=2.0 2023-05-18 21:26:21,324 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=313059.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:26:24,702 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.8355, 4.5580, 4.7264, 4.6796, 4.6194, 4.7383, 4.6798, 2.6109], device='cuda:0'), covar=tensor([0.0104, 0.0071, 0.0087, 0.0073, 0.0060, 0.0107, 0.0087, 0.0854], device='cuda:0'), in_proj_covar=tensor([0.0073, 0.0082, 0.0086, 0.0077, 0.0063, 0.0097, 0.0085, 0.0102], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 21:26:35,688 INFO [finetune.py:992] (0/2) Epoch 18, batch 5100, loss[loss=0.176, simple_loss=0.269, pruned_loss=0.04146, over 12336.00 frames. ], tot_loss[loss=0.162, simple_loss=0.2516, pruned_loss=0.03616, over 2366709.66 frames. ], batch size: 36, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:26:48,182 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.002e+02 2.737e+02 3.179e+02 3.988e+02 8.125e+02, threshold=6.358e+02, percent-clipped=2.0 2023-05-18 21:27:12,266 INFO [finetune.py:992] (0/2) Epoch 18, batch 5150, loss[loss=0.173, simple_loss=0.2613, pruned_loss=0.04233, over 12035.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.2515, pruned_loss=0.03616, over 2357427.15 frames. ], batch size: 40, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:27:21,708 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.5312, 3.2479, 4.9507, 2.5899, 2.7578, 3.6002, 3.0741, 3.6630], device='cuda:0'), covar=tensor([0.0491, 0.1320, 0.0470, 0.1357, 0.2169, 0.1772, 0.1576, 0.1388], device='cuda:0'), in_proj_covar=tensor([0.0246, 0.0245, 0.0268, 0.0190, 0.0248, 0.0306, 0.0233, 0.0278], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 21:27:41,019 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.5315, 2.6071, 4.6670, 4.9020, 3.2334, 2.5507, 2.7972, 2.1245], device='cuda:0'), covar=tensor([0.1807, 0.3433, 0.0427, 0.0331, 0.1059, 0.2585, 0.3195, 0.4356], device='cuda:0'), in_proj_covar=tensor([0.0310, 0.0395, 0.0282, 0.0307, 0.0280, 0.0324, 0.0406, 0.0383], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-18 21:27:42,642 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.37 vs. limit=2.0 2023-05-18 21:27:46,918 INFO [finetune.py:992] (0/2) Epoch 18, batch 5200, loss[loss=0.1639, simple_loss=0.2596, pruned_loss=0.03413, over 11810.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2521, pruned_loss=0.03646, over 2361446.22 frames. ], batch size: 44, lr: 3.24e-03, grad_scale: 16.0 2023-05-18 21:27:59,473 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.989e+02 2.725e+02 3.162e+02 3.690e+02 6.131e+02, threshold=6.324e+02, percent-clipped=0.0 2023-05-18 21:28:21,615 INFO [finetune.py:992] (0/2) Epoch 18, batch 5250, loss[loss=0.1593, simple_loss=0.2525, pruned_loss=0.03309, over 12145.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2523, pruned_loss=0.03683, over 2351837.15 frames. ], batch size: 34, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:28:21,766 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=313230.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:28:41,861 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.6934, 2.7807, 4.7450, 4.9298, 2.9434, 2.6835, 3.0303, 2.2456], device='cuda:0'), covar=tensor([0.1725, 0.3363, 0.0416, 0.0372, 0.1303, 0.2567, 0.3071, 0.4226], device='cuda:0'), in_proj_covar=tensor([0.0310, 0.0395, 0.0281, 0.0307, 0.0281, 0.0323, 0.0407, 0.0383], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-18 21:28:50,031 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=313269.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:28:51,413 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=313271.0, num_to_drop=1, layers_to_drop={0} 2023-05-18 21:28:56,132 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=313278.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:28:57,513 INFO [finetune.py:992] (0/2) Epoch 18, batch 5300, loss[loss=0.1475, simple_loss=0.2357, pruned_loss=0.02963, over 12254.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2522, pruned_loss=0.03675, over 2356348.20 frames. ], batch size: 32, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:29:02,406 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.2251, 4.6159, 2.7774, 2.5089, 4.0422, 2.7737, 3.9372, 3.2464], device='cuda:0'), covar=tensor([0.0786, 0.0579, 0.1265, 0.1714, 0.0278, 0.1271, 0.0542, 0.0851], device='cuda:0'), in_proj_covar=tensor([0.0194, 0.0265, 0.0182, 0.0205, 0.0146, 0.0188, 0.0206, 0.0178], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 21:29:07,127 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=313294.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:29:09,186 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([6.0326, 6.0082, 5.7865, 5.2357, 5.1436, 5.9089, 5.5099, 5.3252], device='cuda:0'), covar=tensor([0.0702, 0.0920, 0.0706, 0.1624, 0.0794, 0.0654, 0.1422, 0.1013], device='cuda:0'), in_proj_covar=tensor([0.0654, 0.0579, 0.0536, 0.0656, 0.0440, 0.0754, 0.0811, 0.0588], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0004, 0.0002], device='cuda:0') 2023-05-18 21:29:09,503 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=3.28 vs. limit=5.0 2023-05-18 21:29:09,761 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.971e+02 2.691e+02 3.047e+02 3.516e+02 5.591e+02, threshold=6.093e+02, percent-clipped=0.0 2023-05-18 21:29:31,955 INFO [finetune.py:992] (0/2) Epoch 18, batch 5350, loss[loss=0.1532, simple_loss=0.2344, pruned_loss=0.03602, over 11450.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.2529, pruned_loss=0.03696, over 2351950.55 frames. ], batch size: 25, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:29:32,194 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=313330.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:29:32,778 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.1897, 5.0756, 5.0326, 5.0953, 4.6209, 5.1349, 5.0456, 5.3700], device='cuda:0'), covar=tensor([0.0238, 0.0166, 0.0189, 0.0379, 0.0829, 0.0465, 0.0212, 0.0164], device='cuda:0'), in_proj_covar=tensor([0.0205, 0.0206, 0.0199, 0.0256, 0.0250, 0.0230, 0.0185, 0.0239], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-18 21:29:33,524 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=313332.0, num_to_drop=1, layers_to_drop={0} 2023-05-18 21:29:40,436 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=313342.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:29:43,347 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.9846, 4.8551, 4.8144, 4.8863, 4.5183, 4.9535, 4.8533, 5.1485], device='cuda:0'), covar=tensor([0.0240, 0.0155, 0.0200, 0.0333, 0.0707, 0.0318, 0.0184, 0.0153], device='cuda:0'), in_proj_covar=tensor([0.0205, 0.0206, 0.0199, 0.0256, 0.0250, 0.0230, 0.0185, 0.0239], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-18 21:29:46,197 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=313350.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:29:48,866 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=313354.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:30:06,249 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.38 vs. limit=2.0 2023-05-18 21:30:07,064 INFO [finetune.py:992] (0/2) Epoch 18, batch 5400, loss[loss=0.1774, simple_loss=0.2747, pruned_loss=0.04004, over 12305.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2519, pruned_loss=0.03631, over 2361846.08 frames. ], batch size: 34, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:30:10,824 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.4929, 4.3668, 4.2263, 4.4851, 3.1196, 4.1007, 2.8055, 4.2642], device='cuda:0'), covar=tensor([0.1495, 0.0587, 0.0873, 0.0628, 0.1123, 0.0580, 0.1763, 0.0913], device='cuda:0'), in_proj_covar=tensor([0.0234, 0.0275, 0.0306, 0.0368, 0.0248, 0.0250, 0.0267, 0.0379], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-18 21:30:20,019 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.921e+02 2.584e+02 2.894e+02 3.600e+02 7.897e+02, threshold=5.788e+02, percent-clipped=2.0 2023-05-18 21:30:29,979 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=313411.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:30:34,742 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.2661, 5.1098, 5.1804, 5.2544, 4.9207, 4.9454, 4.7010, 5.1394], device='cuda:0'), covar=tensor([0.0744, 0.0606, 0.0900, 0.0582, 0.1955, 0.1289, 0.0587, 0.1217], device='cuda:0'), in_proj_covar=tensor([0.0574, 0.0746, 0.0656, 0.0667, 0.0893, 0.0792, 0.0594, 0.0511], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:0') 2023-05-18 21:30:42,856 INFO [finetune.py:992] (0/2) Epoch 18, batch 5450, loss[loss=0.1745, simple_loss=0.2637, pruned_loss=0.04263, over 12089.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2517, pruned_loss=0.03629, over 2361684.34 frames. ], batch size: 42, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:31:17,027 INFO [finetune.py:992] (0/2) Epoch 18, batch 5500, loss[loss=0.1476, simple_loss=0.2368, pruned_loss=0.0292, over 12084.00 frames. ], tot_loss[loss=0.164, simple_loss=0.2534, pruned_loss=0.03734, over 2364957.03 frames. ], batch size: 32, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:31:24,036 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([6.1523, 6.0732, 5.8935, 5.3550, 5.2333, 6.0367, 5.6548, 5.3739], device='cuda:0'), covar=tensor([0.0776, 0.1052, 0.0678, 0.1668, 0.0791, 0.0664, 0.1423, 0.0960], device='cuda:0'), in_proj_covar=tensor([0.0660, 0.0584, 0.0539, 0.0662, 0.0443, 0.0759, 0.0816, 0.0593], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0004, 0.0003], device='cuda:0') 2023-05-18 21:31:29,300 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.748e+02 2.547e+02 3.061e+02 3.924e+02 7.161e+02, threshold=6.121e+02, percent-clipped=6.0 2023-05-18 21:31:32,576 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.98 vs. limit=5.0 2023-05-18 21:31:51,526 INFO [finetune.py:992] (0/2) Epoch 18, batch 5550, loss[loss=0.1958, simple_loss=0.2874, pruned_loss=0.05206, over 12078.00 frames. ], tot_loss[loss=0.1645, simple_loss=0.254, pruned_loss=0.03748, over 2373110.88 frames. ], batch size: 42, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:32:18,952 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.2390, 4.9339, 5.1791, 5.1307, 4.9920, 5.1941, 5.0567, 3.0872], device='cuda:0'), covar=tensor([0.0098, 0.0069, 0.0068, 0.0064, 0.0050, 0.0103, 0.0099, 0.0658], device='cuda:0'), in_proj_covar=tensor([0.0072, 0.0081, 0.0085, 0.0076, 0.0063, 0.0096, 0.0084, 0.0101], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 21:32:28,009 INFO [finetune.py:992] (0/2) Epoch 18, batch 5600, loss[loss=0.1682, simple_loss=0.2715, pruned_loss=0.03244, over 12344.00 frames. ], tot_loss[loss=0.1637, simple_loss=0.2532, pruned_loss=0.03708, over 2371875.82 frames. ], batch size: 36, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:32:40,618 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.767e+02 2.476e+02 2.852e+02 3.313e+02 5.624e+02, threshold=5.704e+02, percent-clipped=0.0 2023-05-18 21:32:59,578 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=313625.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:33:00,949 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=313627.0, num_to_drop=1, layers_to_drop={3} 2023-05-18 21:33:02,405 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([6.0381, 6.0111, 5.7591, 5.2588, 5.1587, 5.9417, 5.5125, 5.3275], device='cuda:0'), covar=tensor([0.0811, 0.1007, 0.0708, 0.1743, 0.0762, 0.0689, 0.1708, 0.1032], device='cuda:0'), in_proj_covar=tensor([0.0655, 0.0581, 0.0536, 0.0657, 0.0440, 0.0754, 0.0812, 0.0588], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0004, 0.0002], device='cuda:0') 2023-05-18 21:33:02,989 INFO [finetune.py:992] (0/2) Epoch 18, batch 5650, loss[loss=0.1658, simple_loss=0.2646, pruned_loss=0.0335, over 12142.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.2532, pruned_loss=0.03683, over 2375169.45 frames. ], batch size: 34, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:33:20,052 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=313654.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:33:38,035 INFO [finetune.py:992] (0/2) Epoch 18, batch 5700, loss[loss=0.1481, simple_loss=0.2302, pruned_loss=0.03303, over 12132.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2522, pruned_loss=0.03651, over 2371958.42 frames. ], batch size: 30, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:33:51,685 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.770e+02 2.438e+02 2.940e+02 3.390e+02 5.441e+02, threshold=5.879e+02, percent-clipped=0.0 2023-05-18 21:33:54,472 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=313702.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:33:57,360 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=313706.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:34:13,519 INFO [finetune.py:992] (0/2) Epoch 18, batch 5750, loss[loss=0.1277, simple_loss=0.2088, pruned_loss=0.02329, over 12295.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2525, pruned_loss=0.0367, over 2375024.95 frames. ], batch size: 28, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:34:22,082 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.57 vs. limit=5.0 2023-05-18 21:34:47,672 INFO [finetune.py:992] (0/2) Epoch 18, batch 5800, loss[loss=0.1623, simple_loss=0.253, pruned_loss=0.03586, over 12109.00 frames. ], tot_loss[loss=0.1633, simple_loss=0.2529, pruned_loss=0.03683, over 2379298.83 frames. ], batch size: 33, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:34:57,565 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=313794.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:35:00,092 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.106e+02 2.651e+02 3.293e+02 3.863e+02 6.789e+02, threshold=6.585e+02, percent-clipped=3.0 2023-05-18 21:35:02,862 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.6609, 3.2933, 5.1028, 2.5566, 2.8647, 3.7072, 3.0230, 3.7441], device='cuda:0'), covar=tensor([0.0412, 0.1232, 0.0263, 0.1287, 0.1969, 0.1580, 0.1541, 0.1233], device='cuda:0'), in_proj_covar=tensor([0.0244, 0.0244, 0.0265, 0.0188, 0.0246, 0.0304, 0.0232, 0.0277], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 21:35:18,478 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=313824.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:35:22,597 INFO [finetune.py:992] (0/2) Epoch 18, batch 5850, loss[loss=0.1675, simple_loss=0.2604, pruned_loss=0.03733, over 12191.00 frames. ], tot_loss[loss=0.1636, simple_loss=0.2532, pruned_loss=0.037, over 2378102.56 frames. ], batch size: 35, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:35:41,058 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=313855.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:35:47,373 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=313864.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:35:58,129 INFO [finetune.py:992] (0/2) Epoch 18, batch 5900, loss[loss=0.2273, simple_loss=0.3078, pruned_loss=0.07338, over 8088.00 frames. ], tot_loss[loss=0.1643, simple_loss=0.2539, pruned_loss=0.03738, over 2371553.32 frames. ], batch size: 97, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:36:01,755 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=313885.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:36:10,482 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.656e+02 3.061e+02 3.514e+02 6.124e+02, threshold=6.123e+02, percent-clipped=0.0 2023-05-18 21:36:29,371 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=313925.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:36:29,428 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=313925.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:36:30,731 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=313927.0, num_to_drop=1, layers_to_drop={2} 2023-05-18 21:36:31,405 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0162, 4.8816, 4.8622, 4.7811, 4.5001, 4.9920, 4.9293, 5.1480], device='cuda:0'), covar=tensor([0.0212, 0.0160, 0.0193, 0.0407, 0.0757, 0.0332, 0.0157, 0.0185], device='cuda:0'), in_proj_covar=tensor([0.0205, 0.0206, 0.0199, 0.0257, 0.0249, 0.0231, 0.0184, 0.0240], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-18 21:36:32,521 INFO [finetune.py:992] (0/2) Epoch 18, batch 5950, loss[loss=0.1858, simple_loss=0.2742, pruned_loss=0.0487, over 11343.00 frames. ], tot_loss[loss=0.1641, simple_loss=0.2542, pruned_loss=0.03704, over 2373029.36 frames. ], batch size: 55, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:36:45,891 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.8398, 4.7379, 4.6554, 4.7927, 3.7063, 4.9777, 4.8666, 5.0463], device='cuda:0'), covar=tensor([0.0338, 0.0247, 0.0275, 0.0433, 0.1479, 0.0398, 0.0244, 0.0239], device='cuda:0'), in_proj_covar=tensor([0.0205, 0.0207, 0.0199, 0.0258, 0.0250, 0.0231, 0.0185, 0.0240], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-18 21:37:02,055 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=313973.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:37:03,486 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=313975.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 21:37:06,873 INFO [finetune.py:992] (0/2) Epoch 18, batch 6000, loss[loss=0.1797, simple_loss=0.2713, pruned_loss=0.0441, over 12197.00 frames. ], tot_loss[loss=0.1649, simple_loss=0.2551, pruned_loss=0.03736, over 2367049.07 frames. ], batch size: 35, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:37:06,874 INFO [finetune.py:1017] (0/2) Computing validation loss 2023-05-18 21:37:24,912 INFO [finetune.py:1026] (0/2) Epoch 18, validation: loss=0.3118, simple_loss=0.3886, pruned_loss=0.1174, over 1020973.00 frames. 2023-05-18 21:37:24,913 INFO [finetune.py:1027] (0/2) Maximum memory allocated so far is 12508MB 2023-05-18 21:37:37,448 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.921e+02 2.807e+02 3.210e+02 3.782e+02 1.181e+03, threshold=6.421e+02, percent-clipped=7.0 2023-05-18 21:37:39,138 INFO [checkpoint.py:75] (0/2) Saving checkpoint to pruned_transducer_stateless7/exp_giga_finetune/checkpoint-214000.pt 2023-05-18 21:37:46,413 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=314006.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:38:02,870 INFO [finetune.py:992] (0/2) Epoch 18, batch 6050, loss[loss=0.1573, simple_loss=0.2484, pruned_loss=0.03312, over 12189.00 frames. ], tot_loss[loss=0.1644, simple_loss=0.2544, pruned_loss=0.03718, over 2369415.04 frames. ], batch size: 31, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:38:19,443 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=314054.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:38:37,371 INFO [finetune.py:992] (0/2) Epoch 18, batch 6100, loss[loss=0.1658, simple_loss=0.2482, pruned_loss=0.04173, over 12335.00 frames. ], tot_loss[loss=0.1646, simple_loss=0.2542, pruned_loss=0.03747, over 2367928.16 frames. ], batch size: 31, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:38:51,129 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.400e+02 2.937e+02 3.569e+02 7.911e+02, threshold=5.875e+02, percent-clipped=1.0 2023-05-18 21:39:04,990 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.3863, 4.2292, 4.3012, 4.5618, 3.1015, 4.2104, 2.8067, 4.2958], device='cuda:0'), covar=tensor([0.1641, 0.0684, 0.0856, 0.0601, 0.1172, 0.0537, 0.1735, 0.1066], device='cuda:0'), in_proj_covar=tensor([0.0233, 0.0275, 0.0304, 0.0365, 0.0248, 0.0249, 0.0266, 0.0378], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-18 21:39:09,250 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.8108, 2.9607, 4.6765, 4.8092, 2.8643, 2.6679, 3.0628, 2.2443], device='cuda:0'), covar=tensor([0.1723, 0.3109, 0.0453, 0.0436, 0.1462, 0.2632, 0.2800, 0.4204], device='cuda:0'), in_proj_covar=tensor([0.0310, 0.0396, 0.0283, 0.0308, 0.0281, 0.0324, 0.0407, 0.0383], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-18 21:39:13,065 INFO [finetune.py:992] (0/2) Epoch 18, batch 6150, loss[loss=0.1782, simple_loss=0.2571, pruned_loss=0.04964, over 12128.00 frames. ], tot_loss[loss=0.1654, simple_loss=0.2552, pruned_loss=0.03778, over 2365440.15 frames. ], batch size: 39, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:39:27,079 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=314150.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:39:47,817 INFO [finetune.py:992] (0/2) Epoch 18, batch 6200, loss[loss=0.1562, simple_loss=0.2508, pruned_loss=0.03077, over 10651.00 frames. ], tot_loss[loss=0.1654, simple_loss=0.2551, pruned_loss=0.03782, over 2367235.92 frames. ], batch size: 69, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:39:47,902 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=314180.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:40:01,104 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.588e+02 3.111e+02 3.927e+02 7.134e+02, threshold=6.221e+02, percent-clipped=2.0 2023-05-18 21:40:02,244 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=314200.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:40:15,631 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=314220.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:40:22,469 INFO [finetune.py:992] (0/2) Epoch 18, batch 6250, loss[loss=0.1417, simple_loss=0.2231, pruned_loss=0.0302, over 12155.00 frames. ], tot_loss[loss=0.1641, simple_loss=0.2537, pruned_loss=0.03724, over 2378945.75 frames. ], batch size: 29, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:40:45,171 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=314261.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:40:58,297 INFO [finetune.py:992] (0/2) Epoch 18, batch 6300, loss[loss=0.1716, simple_loss=0.2603, pruned_loss=0.04147, over 12245.00 frames. ], tot_loss[loss=0.164, simple_loss=0.2539, pruned_loss=0.03707, over 2377586.19 frames. ], batch size: 32, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:41:11,489 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.730e+02 2.601e+02 2.924e+02 3.480e+02 6.548e+02, threshold=5.848e+02, percent-clipped=1.0 2023-05-18 21:41:24,732 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.5121, 5.0084, 5.4969, 4.7984, 5.0863, 4.9037, 5.5249, 5.1165], device='cuda:0'), covar=tensor([0.0247, 0.0428, 0.0252, 0.0269, 0.0400, 0.0331, 0.0184, 0.0285], device='cuda:0'), in_proj_covar=tensor([0.0282, 0.0289, 0.0312, 0.0282, 0.0282, 0.0281, 0.0255, 0.0229], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 21:41:32,883 INFO [finetune.py:992] (0/2) Epoch 18, batch 6350, loss[loss=0.1333, simple_loss=0.2194, pruned_loss=0.02363, over 12173.00 frames. ], tot_loss[loss=0.1635, simple_loss=0.2534, pruned_loss=0.03682, over 2384335.78 frames. ], batch size: 29, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:41:54,116 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.1270, 3.9408, 2.6223, 2.2217, 3.5835, 2.3387, 3.5611, 2.8490], device='cuda:0'), covar=tensor([0.0723, 0.0661, 0.1206, 0.1793, 0.0303, 0.1563, 0.0570, 0.0942], device='cuda:0'), in_proj_covar=tensor([0.0194, 0.0268, 0.0182, 0.0206, 0.0147, 0.0189, 0.0206, 0.0180], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 21:42:07,287 INFO [finetune.py:992] (0/2) Epoch 18, batch 6400, loss[loss=0.1343, simple_loss=0.2172, pruned_loss=0.0257, over 12222.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.253, pruned_loss=0.03657, over 2376643.27 frames. ], batch size: 29, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:42:21,461 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.678e+02 2.738e+02 3.152e+02 3.622e+02 1.272e+03, threshold=6.305e+02, percent-clipped=1.0 2023-05-18 21:42:43,361 INFO [finetune.py:992] (0/2) Epoch 18, batch 6450, loss[loss=0.1793, simple_loss=0.2691, pruned_loss=0.04474, over 11882.00 frames. ], tot_loss[loss=0.1636, simple_loss=0.2534, pruned_loss=0.03688, over 2376131.27 frames. ], batch size: 44, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:42:57,296 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=314450.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:43:10,227 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=5.02 vs. limit=5.0 2023-05-18 21:43:18,139 INFO [finetune.py:992] (0/2) Epoch 18, batch 6500, loss[loss=0.1282, simple_loss=0.2205, pruned_loss=0.01797, over 12181.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2528, pruned_loss=0.03666, over 2369621.77 frames. ], batch size: 31, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:43:18,268 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=314480.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:43:30,915 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=314498.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:43:31,506 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.928e+02 2.650e+02 3.059e+02 3.575e+02 7.984e+02, threshold=6.118e+02, percent-clipped=1.0 2023-05-18 21:43:45,957 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=314520.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:43:52,047 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=314528.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:43:53,422 INFO [finetune.py:992] (0/2) Epoch 18, batch 6550, loss[loss=0.1743, simple_loss=0.2675, pruned_loss=0.04051, over 12350.00 frames. ], tot_loss[loss=0.1637, simple_loss=0.2534, pruned_loss=0.03703, over 2368007.21 frames. ], batch size: 36, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:43:57,870 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=314536.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:44:11,920 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=314556.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:44:20,024 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=314568.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:44:20,189 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.7576, 3.6571, 3.3744, 3.3072, 2.9445, 2.8594, 3.7562, 2.6422], device='cuda:0'), covar=tensor([0.0390, 0.0136, 0.0184, 0.0213, 0.0394, 0.0360, 0.0120, 0.0485], device='cuda:0'), in_proj_covar=tensor([0.0198, 0.0170, 0.0173, 0.0198, 0.0209, 0.0204, 0.0180, 0.0210], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 21:44:28,291 INFO [finetune.py:992] (0/2) Epoch 18, batch 6600, loss[loss=0.1736, simple_loss=0.2666, pruned_loss=0.04029, over 12122.00 frames. ], tot_loss[loss=0.1643, simple_loss=0.2541, pruned_loss=0.03729, over 2364821.95 frames. ], batch size: 39, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:44:40,292 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=314597.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:44:41,490 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.516e+02 3.007e+02 3.648e+02 7.193e+02, threshold=6.013e+02, percent-clipped=1.0 2023-05-18 21:45:03,482 INFO [finetune.py:992] (0/2) Epoch 18, batch 6650, loss[loss=0.1534, simple_loss=0.2407, pruned_loss=0.03306, over 12167.00 frames. ], tot_loss[loss=0.164, simple_loss=0.2537, pruned_loss=0.03717, over 2363557.52 frames. ], batch size: 31, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:45:38,802 INFO [finetune.py:992] (0/2) Epoch 18, batch 6700, loss[loss=0.1493, simple_loss=0.2395, pruned_loss=0.02951, over 12350.00 frames. ], tot_loss[loss=0.1636, simple_loss=0.2534, pruned_loss=0.03689, over 2360311.13 frames. ], batch size: 31, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:45:52,590 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.778e+02 2.657e+02 3.017e+02 3.709e+02 7.789e+02, threshold=6.033e+02, percent-clipped=3.0 2023-05-18 21:46:14,030 INFO [finetune.py:992] (0/2) Epoch 18, batch 6750, loss[loss=0.1662, simple_loss=0.2686, pruned_loss=0.0319, over 12175.00 frames. ], tot_loss[loss=0.1642, simple_loss=0.2541, pruned_loss=0.03711, over 2362562.06 frames. ], batch size: 35, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:46:22,735 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.9338, 4.6199, 4.2742, 4.1129, 4.7138, 3.9921, 4.1893, 4.0372], device='cuda:0'), covar=tensor([0.1802, 0.1232, 0.1593, 0.2274, 0.1208, 0.2642, 0.2057, 0.1616], device='cuda:0'), in_proj_covar=tensor([0.0382, 0.0534, 0.0426, 0.0470, 0.0489, 0.0469, 0.0428, 0.0410], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 21:46:43,566 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0254, 4.6358, 4.7344, 4.8910, 4.7234, 4.9594, 4.8002, 2.8286], device='cuda:0'), covar=tensor([0.0091, 0.0076, 0.0102, 0.0069, 0.0052, 0.0098, 0.0166, 0.0752], device='cuda:0'), in_proj_covar=tensor([0.0073, 0.0083, 0.0088, 0.0078, 0.0064, 0.0098, 0.0086, 0.0103], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 21:46:49,114 INFO [finetune.py:992] (0/2) Epoch 18, batch 6800, loss[loss=0.1418, simple_loss=0.2253, pruned_loss=0.02915, over 12329.00 frames. ], tot_loss[loss=0.1635, simple_loss=0.2533, pruned_loss=0.03688, over 2359423.88 frames. ], batch size: 30, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:47:02,248 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.884e+02 2.521e+02 2.899e+02 3.298e+02 8.690e+02, threshold=5.798e+02, percent-clipped=2.0 2023-05-18 21:47:25,178 INFO [finetune.py:992] (0/2) Epoch 18, batch 6850, loss[loss=0.1549, simple_loss=0.2472, pruned_loss=0.03126, over 12165.00 frames. ], tot_loss[loss=0.1637, simple_loss=0.2538, pruned_loss=0.03676, over 2357892.10 frames. ], batch size: 36, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:47:39,981 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=2.88 vs. limit=5.0 2023-05-18 21:47:43,880 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=314856.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:48:00,059 INFO [finetune.py:992] (0/2) Epoch 18, batch 6900, loss[loss=0.1879, simple_loss=0.2857, pruned_loss=0.0451, over 8099.00 frames. ], tot_loss[loss=0.1633, simple_loss=0.2538, pruned_loss=0.03639, over 2359792.15 frames. ], batch size: 97, lr: 3.22e-03, grad_scale: 4.0 2023-05-18 21:48:08,469 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=314892.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:48:13,841 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.056e+02 2.648e+02 3.153e+02 3.942e+02 5.673e+02, threshold=6.307e+02, percent-clipped=0.0 2023-05-18 21:48:16,537 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=314904.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:48:21,853 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.15 vs. limit=2.0 2023-05-18 21:48:34,563 INFO [finetune.py:992] (0/2) Epoch 18, batch 6950, loss[loss=0.1574, simple_loss=0.2395, pruned_loss=0.03759, over 12124.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.2537, pruned_loss=0.0366, over 2360546.90 frames. ], batch size: 30, lr: 3.22e-03, grad_scale: 4.0 2023-05-18 21:48:37,506 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=314934.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:48:56,399 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=314960.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:49:03,243 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.4802, 5.4715, 5.3442, 4.8307, 4.8866, 5.4465, 5.0368, 4.9617], device='cuda:0'), covar=tensor([0.0907, 0.1001, 0.0682, 0.1578, 0.1083, 0.0776, 0.1830, 0.1089], device='cuda:0'), in_proj_covar=tensor([0.0664, 0.0587, 0.0539, 0.0664, 0.0442, 0.0762, 0.0820, 0.0592], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0004, 0.0002], device='cuda:0') 2023-05-18 21:49:10,220 INFO [finetune.py:992] (0/2) Epoch 18, batch 7000, loss[loss=0.1332, simple_loss=0.2258, pruned_loss=0.02033, over 12174.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.2536, pruned_loss=0.03646, over 2366622.65 frames. ], batch size: 31, lr: 3.22e-03, grad_scale: 4.0 2023-05-18 21:49:21,165 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=314995.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:49:24,531 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.910e+02 2.740e+02 3.038e+02 3.572e+02 5.545e+02, threshold=6.076e+02, percent-clipped=0.0 2023-05-18 21:49:39,574 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=315021.0, num_to_drop=1, layers_to_drop={3} 2023-05-18 21:49:45,682 INFO [finetune.py:992] (0/2) Epoch 18, batch 7050, loss[loss=0.1543, simple_loss=0.2504, pruned_loss=0.02908, over 12076.00 frames. ], tot_loss[loss=0.1633, simple_loss=0.2536, pruned_loss=0.03651, over 2363093.81 frames. ], batch size: 32, lr: 3.22e-03, grad_scale: 4.0 2023-05-18 21:49:45,849 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=315030.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:49:57,803 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.2874, 4.7115, 3.0861, 2.7679, 4.0754, 2.5021, 4.0434, 3.2226], device='cuda:0'), covar=tensor([0.0803, 0.0686, 0.1152, 0.1707, 0.0359, 0.1564, 0.0542, 0.0925], device='cuda:0'), in_proj_covar=tensor([0.0194, 0.0267, 0.0182, 0.0206, 0.0147, 0.0189, 0.0207, 0.0179], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 21:50:03,103 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.8459, 5.7276, 5.3462, 5.2550, 5.8467, 5.1265, 5.0934, 5.1825], device='cuda:0'), covar=tensor([0.1604, 0.0967, 0.1059, 0.1717, 0.0886, 0.2244, 0.2346, 0.1379], device='cuda:0'), in_proj_covar=tensor([0.0378, 0.0529, 0.0424, 0.0467, 0.0486, 0.0467, 0.0427, 0.0408], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 21:50:20,631 INFO [finetune.py:992] (0/2) Epoch 18, batch 7100, loss[loss=0.1336, simple_loss=0.2126, pruned_loss=0.02726, over 12029.00 frames. ], tot_loss[loss=0.1635, simple_loss=0.2537, pruned_loss=0.03667, over 2363109.13 frames. ], batch size: 28, lr: 3.22e-03, grad_scale: 4.0 2023-05-18 21:50:28,579 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=315091.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:50:34,547 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.881e+02 2.567e+02 2.898e+02 3.446e+02 5.668e+02, threshold=5.796e+02, percent-clipped=0.0 2023-05-18 21:50:55,445 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.8960, 3.9083, 3.9435, 4.0374, 3.7849, 3.8376, 3.6746, 3.9181], device='cuda:0'), covar=tensor([0.1470, 0.0720, 0.1311, 0.0674, 0.1619, 0.1216, 0.0609, 0.1060], device='cuda:0'), in_proj_covar=tensor([0.0559, 0.0738, 0.0645, 0.0652, 0.0877, 0.0774, 0.0583, 0.0500], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0003, 0.0003], device='cuda:0') 2023-05-18 21:50:56,763 INFO [finetune.py:992] (0/2) Epoch 18, batch 7150, loss[loss=0.1366, simple_loss=0.2193, pruned_loss=0.02691, over 12348.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.2534, pruned_loss=0.03644, over 2367668.43 frames. ], batch size: 30, lr: 3.22e-03, grad_scale: 4.0 2023-05-18 21:51:11,095 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.9079, 3.4220, 5.2816, 2.7699, 2.8876, 3.9143, 3.4081, 3.9406], device='cuda:0'), covar=tensor([0.0461, 0.1185, 0.0297, 0.1156, 0.1985, 0.1615, 0.1320, 0.1175], device='cuda:0'), in_proj_covar=tensor([0.0241, 0.0241, 0.0263, 0.0186, 0.0242, 0.0300, 0.0230, 0.0273], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 21:51:25,633 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.5475, 5.1559, 5.5553, 4.8358, 5.1609, 4.9543, 5.5714, 5.1347], device='cuda:0'), covar=tensor([0.0272, 0.0391, 0.0270, 0.0278, 0.0394, 0.0341, 0.0210, 0.0293], device='cuda:0'), in_proj_covar=tensor([0.0283, 0.0288, 0.0312, 0.0281, 0.0281, 0.0280, 0.0254, 0.0229], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 21:51:27,036 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=315173.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:51:28,709 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.37 vs. limit=2.0 2023-05-18 21:51:31,671 INFO [finetune.py:992] (0/2) Epoch 18, batch 7200, loss[loss=0.1482, simple_loss=0.2305, pruned_loss=0.03293, over 12038.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2529, pruned_loss=0.03625, over 2371138.96 frames. ], batch size: 31, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 21:51:34,644 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.5786, 2.8621, 3.8625, 4.7049, 3.9633, 4.6826, 3.8839, 3.3301], device='cuda:0'), covar=tensor([0.0045, 0.0396, 0.0141, 0.0038, 0.0143, 0.0074, 0.0146, 0.0411], device='cuda:0'), in_proj_covar=tensor([0.0095, 0.0127, 0.0109, 0.0084, 0.0110, 0.0121, 0.0106, 0.0145], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 21:51:40,019 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=315192.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:51:45,345 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.719e+02 2.526e+02 2.907e+02 3.593e+02 5.562e+02, threshold=5.814e+02, percent-clipped=0.0 2023-05-18 21:51:48,590 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.1234, 6.0006, 5.5477, 5.5213, 6.0845, 5.3206, 5.4813, 5.5475], device='cuda:0'), covar=tensor([0.1649, 0.0948, 0.1051, 0.1835, 0.0889, 0.2124, 0.2041, 0.1250], device='cuda:0'), in_proj_covar=tensor([0.0376, 0.0527, 0.0422, 0.0466, 0.0484, 0.0464, 0.0427, 0.0408], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 21:52:06,700 INFO [finetune.py:992] (0/2) Epoch 18, batch 7250, loss[loss=0.1446, simple_loss=0.2363, pruned_loss=0.0265, over 12110.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.252, pruned_loss=0.03618, over 2370697.62 frames. ], batch size: 33, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 21:52:09,640 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=315234.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:52:13,712 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=315240.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:52:42,438 INFO [finetune.py:992] (0/2) Epoch 18, batch 7300, loss[loss=0.169, simple_loss=0.2565, pruned_loss=0.04076, over 12143.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2529, pruned_loss=0.03657, over 2362855.19 frames. ], batch size: 38, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 21:52:49,452 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=315290.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:52:56,196 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.996e+02 2.573e+02 3.018e+02 3.634e+02 9.167e+02, threshold=6.037e+02, percent-clipped=2.0 2023-05-18 21:53:07,227 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=315316.0, num_to_drop=1, layers_to_drop={2} 2023-05-18 21:53:12,902 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=315324.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:53:16,981 INFO [finetune.py:992] (0/2) Epoch 18, batch 7350, loss[loss=0.1593, simple_loss=0.2454, pruned_loss=0.03661, over 12174.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.2535, pruned_loss=0.03669, over 2364479.67 frames. ], batch size: 31, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 21:53:22,258 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.46 vs. limit=2.0 2023-05-18 21:53:29,452 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.17 vs. limit=2.0 2023-05-18 21:53:47,549 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=315374.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:53:51,517 INFO [finetune.py:992] (0/2) Epoch 18, batch 7400, loss[loss=0.141, simple_loss=0.2336, pruned_loss=0.02416, over 12111.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2525, pruned_loss=0.03633, over 2374577.37 frames. ], batch size: 33, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 21:53:53,027 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=315382.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:53:55,142 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=315385.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:53:55,723 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=315386.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:54:05,942 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.760e+02 2.520e+02 3.033e+02 3.513e+02 9.241e+02, threshold=6.066e+02, percent-clipped=3.0 2023-05-18 21:54:14,112 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.5040, 4.7995, 4.2187, 5.1023, 4.6676, 3.1672, 4.2491, 3.1885], device='cuda:0'), covar=tensor([0.0808, 0.0835, 0.1576, 0.0638, 0.1207, 0.1677, 0.1314, 0.3483], device='cuda:0'), in_proj_covar=tensor([0.0320, 0.0390, 0.0371, 0.0348, 0.0384, 0.0283, 0.0359, 0.0377], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 21:54:27,600 INFO [finetune.py:992] (0/2) Epoch 18, batch 7450, loss[loss=0.1414, simple_loss=0.239, pruned_loss=0.02189, over 12189.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2523, pruned_loss=0.03619, over 2375943.42 frames. ], batch size: 35, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 21:54:31,329 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=315435.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:54:36,905 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=315443.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:55:00,513 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.4491, 3.5809, 3.1437, 3.0527, 2.7584, 2.6248, 3.5728, 2.3949], device='cuda:0'), covar=tensor([0.0451, 0.0171, 0.0253, 0.0249, 0.0479, 0.0490, 0.0160, 0.0526], device='cuda:0'), in_proj_covar=tensor([0.0201, 0.0173, 0.0175, 0.0200, 0.0211, 0.0207, 0.0185, 0.0213], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 21:55:02,368 INFO [finetune.py:992] (0/2) Epoch 18, batch 7500, loss[loss=0.1642, simple_loss=0.25, pruned_loss=0.03917, over 12015.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.2528, pruned_loss=0.03639, over 2374452.69 frames. ], batch size: 31, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 21:55:16,092 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.622e+02 2.675e+02 3.112e+02 3.844e+02 9.516e+02, threshold=6.223e+02, percent-clipped=2.0 2023-05-18 21:55:36,153 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=315529.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:55:36,788 INFO [finetune.py:992] (0/2) Epoch 18, batch 7550, loss[loss=0.1605, simple_loss=0.259, pruned_loss=0.03098, over 12369.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2526, pruned_loss=0.0361, over 2381474.46 frames. ], batch size: 35, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 21:55:56,664 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.38 vs. limit=2.0 2023-05-18 21:56:12,517 INFO [finetune.py:992] (0/2) Epoch 18, batch 7600, loss[loss=0.1661, simple_loss=0.2566, pruned_loss=0.03775, over 12027.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2527, pruned_loss=0.0364, over 2379780.08 frames. ], batch size: 42, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 21:56:19,728 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=315590.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:56:26,710 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.697e+02 3.069e+02 3.511e+02 7.016e+02, threshold=6.139e+02, percent-clipped=2.0 2023-05-18 21:56:38,077 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=315616.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:56:47,509 INFO [finetune.py:992] (0/2) Epoch 18, batch 7650, loss[loss=0.1472, simple_loss=0.2315, pruned_loss=0.03146, over 12133.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2524, pruned_loss=0.03626, over 2382032.99 frames. ], batch size: 30, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 21:56:53,138 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=315638.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:56:55,503 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.0550, 4.6184, 3.8997, 4.7787, 4.1793, 2.7504, 4.0045, 2.8592], device='cuda:0'), covar=tensor([0.1005, 0.0741, 0.1571, 0.0578, 0.1366, 0.1967, 0.1265, 0.3726], device='cuda:0'), in_proj_covar=tensor([0.0317, 0.0387, 0.0369, 0.0346, 0.0381, 0.0280, 0.0356, 0.0375], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 21:56:59,200 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.22 vs. limit=2.0 2023-05-18 21:57:03,832 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.3809, 5.2115, 5.2442, 5.3656, 5.0304, 5.0492, 4.8022, 5.2364], device='cuda:0'), covar=tensor([0.0665, 0.0625, 0.0822, 0.0593, 0.1738, 0.1313, 0.0531, 0.1243], device='cuda:0'), in_proj_covar=tensor([0.0572, 0.0756, 0.0658, 0.0666, 0.0900, 0.0791, 0.0598, 0.0512], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:0') 2023-05-18 21:57:11,570 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=315664.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:57:19,474 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0498, 4.8882, 4.9845, 5.0389, 4.7364, 4.7426, 4.5291, 4.8744], device='cuda:0'), covar=tensor([0.0704, 0.0608, 0.0898, 0.0586, 0.1691, 0.1326, 0.0535, 0.1259], device='cuda:0'), in_proj_covar=tensor([0.0573, 0.0757, 0.0659, 0.0667, 0.0902, 0.0792, 0.0599, 0.0512], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:0') 2023-05-18 21:57:22,764 INFO [finetune.py:992] (0/2) Epoch 18, batch 7700, loss[loss=0.1458, simple_loss=0.2316, pruned_loss=0.02999, over 11781.00 frames. ], tot_loss[loss=0.164, simple_loss=0.2539, pruned_loss=0.03705, over 2367380.94 frames. ], batch size: 26, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 21:57:22,839 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=315680.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:57:26,968 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=315686.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:57:37,198 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.697e+02 2.612e+02 3.026e+02 3.917e+02 7.864e+02, threshold=6.052e+02, percent-clipped=3.0 2023-05-18 21:57:49,078 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.1261, 3.6377, 5.3982, 2.8951, 3.0739, 3.9457, 3.6258, 3.9416], device='cuda:0'), covar=tensor([0.0370, 0.1128, 0.0295, 0.1164, 0.1957, 0.1681, 0.1229, 0.1261], device='cuda:0'), in_proj_covar=tensor([0.0242, 0.0242, 0.0265, 0.0187, 0.0243, 0.0301, 0.0231, 0.0274], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 21:57:58,406 INFO [finetune.py:992] (0/2) Epoch 18, batch 7750, loss[loss=0.1556, simple_loss=0.2479, pruned_loss=0.03161, over 12340.00 frames. ], tot_loss[loss=0.1638, simple_loss=0.2541, pruned_loss=0.03674, over 2373725.28 frames. ], batch size: 36, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 21:57:58,487 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=315730.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:58:01,092 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=315734.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:58:03,921 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=315738.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:58:33,084 INFO [finetune.py:992] (0/2) Epoch 18, batch 7800, loss[loss=0.1481, simple_loss=0.2396, pruned_loss=0.02825, over 12030.00 frames. ], tot_loss[loss=0.1641, simple_loss=0.2538, pruned_loss=0.03718, over 2363582.61 frames. ], batch size: 31, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 21:58:41,549 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.1821, 2.4609, 3.0441, 4.0846, 2.1679, 4.1321, 4.0592, 4.2131], device='cuda:0'), covar=tensor([0.0144, 0.1214, 0.0525, 0.0153, 0.1413, 0.0272, 0.0224, 0.0118], device='cuda:0'), in_proj_covar=tensor([0.0128, 0.0208, 0.0187, 0.0125, 0.0192, 0.0186, 0.0185, 0.0129], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 21:58:46,894 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 2.693e+02 3.248e+02 3.855e+02 6.960e+02, threshold=6.497e+02, percent-clipped=3.0 2023-05-18 21:59:01,662 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.52 vs. limit=2.0 2023-05-18 21:59:07,424 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=315829.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:59:07,960 INFO [finetune.py:992] (0/2) Epoch 18, batch 7850, loss[loss=0.1514, simple_loss=0.2424, pruned_loss=0.03019, over 12150.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.2529, pruned_loss=0.03673, over 2371036.72 frames. ], batch size: 36, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 21:59:17,919 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.2573, 4.0896, 4.0198, 4.3479, 2.9959, 3.9923, 2.5976, 3.9732], device='cuda:0'), covar=tensor([0.1531, 0.0672, 0.0856, 0.0655, 0.1139, 0.0561, 0.1792, 0.0961], device='cuda:0'), in_proj_covar=tensor([0.0231, 0.0272, 0.0302, 0.0360, 0.0246, 0.0246, 0.0263, 0.0373], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-18 21:59:41,992 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=315877.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:59:44,007 INFO [finetune.py:992] (0/2) Epoch 18, batch 7900, loss[loss=0.1659, simple_loss=0.2584, pruned_loss=0.03669, over 12278.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.253, pruned_loss=0.03668, over 2379432.04 frames. ], batch size: 33, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 21:59:57,750 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.924e+02 2.526e+02 2.964e+02 3.737e+02 6.219e+02, threshold=5.928e+02, percent-clipped=0.0 2023-05-18 22:00:06,783 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=315913.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:00:15,880 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.3924, 4.8173, 4.1712, 5.0696, 4.5198, 2.8903, 4.3180, 3.0010], device='cuda:0'), covar=tensor([0.0767, 0.0602, 0.1505, 0.0446, 0.1307, 0.1732, 0.1056, 0.3465], device='cuda:0'), in_proj_covar=tensor([0.0317, 0.0386, 0.0369, 0.0344, 0.0380, 0.0280, 0.0355, 0.0374], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 22:00:18,444 INFO [finetune.py:992] (0/2) Epoch 18, batch 7950, loss[loss=0.1494, simple_loss=0.2512, pruned_loss=0.02387, over 12304.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.2529, pruned_loss=0.03669, over 2370865.56 frames. ], batch size: 34, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 22:00:27,379 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.24 vs. limit=2.0 2023-05-18 22:00:43,317 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.9379, 3.3881, 5.2507, 2.7220, 2.8806, 3.8470, 3.3296, 3.8175], device='cuda:0'), covar=tensor([0.0391, 0.1178, 0.0303, 0.1229, 0.1957, 0.1633, 0.1381, 0.1306], device='cuda:0'), in_proj_covar=tensor([0.0242, 0.0243, 0.0267, 0.0188, 0.0245, 0.0302, 0.0232, 0.0276], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 22:00:49,296 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.7513, 3.4146, 3.4960, 3.6786, 3.7100, 3.7243, 3.5369, 2.6756], device='cuda:0'), covar=tensor([0.0148, 0.0162, 0.0167, 0.0112, 0.0071, 0.0145, 0.0107, 0.0755], device='cuda:0'), in_proj_covar=tensor([0.0075, 0.0084, 0.0089, 0.0078, 0.0065, 0.0100, 0.0087, 0.0104], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 22:00:49,340 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=315974.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:00:53,379 INFO [finetune.py:992] (0/2) Epoch 18, batch 8000, loss[loss=0.1541, simple_loss=0.2364, pruned_loss=0.03589, over 12191.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2524, pruned_loss=0.0365, over 2375055.82 frames. ], batch size: 29, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 22:00:53,499 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=315980.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:01:08,517 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.851e+02 2.505e+02 2.918e+02 3.507e+02 5.646e+02, threshold=5.837e+02, percent-clipped=0.0 2023-05-18 22:01:08,837 INFO [checkpoint.py:75] (0/2) Saving checkpoint to pruned_transducer_stateless7/exp_giga_finetune/checkpoint-216000.pt 2023-05-18 22:01:21,469 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=316014.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:01:31,078 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=316028.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:01:32,385 INFO [finetune.py:992] (0/2) Epoch 18, batch 8050, loss[loss=0.1536, simple_loss=0.2506, pruned_loss=0.02831, over 12015.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2521, pruned_loss=0.03632, over 2370997.08 frames. ], batch size: 40, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 22:01:32,498 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=316030.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:01:38,042 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=316038.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:02:03,960 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=316075.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:02:05,942 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=316078.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:02:07,323 INFO [finetune.py:992] (0/2) Epoch 18, batch 8100, loss[loss=0.1526, simple_loss=0.2491, pruned_loss=0.02808, over 12276.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.2512, pruned_loss=0.0363, over 2372836.54 frames. ], batch size: 37, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 22:02:11,430 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=316086.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:02:20,969 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.705e+02 2.492e+02 2.909e+02 3.620e+02 8.244e+02, threshold=5.819e+02, percent-clipped=2.0 2023-05-18 22:02:41,687 INFO [finetune.py:992] (0/2) Epoch 18, batch 8150, loss[loss=0.1742, simple_loss=0.2678, pruned_loss=0.04024, over 11108.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2523, pruned_loss=0.03672, over 2366400.49 frames. ], batch size: 55, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 22:03:17,438 INFO [finetune.py:992] (0/2) Epoch 18, batch 8200, loss[loss=0.1711, simple_loss=0.2586, pruned_loss=0.04179, over 12410.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.2526, pruned_loss=0.03704, over 2367710.01 frames. ], batch size: 32, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 22:03:31,287 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.023e+02 2.709e+02 3.241e+02 3.987e+02 7.613e+02, threshold=6.482e+02, percent-clipped=5.0 2023-05-18 22:03:52,110 INFO [finetune.py:992] (0/2) Epoch 18, batch 8250, loss[loss=0.1634, simple_loss=0.2534, pruned_loss=0.03671, over 12097.00 frames. ], tot_loss[loss=0.1637, simple_loss=0.2536, pruned_loss=0.03692, over 2375416.44 frames. ], batch size: 32, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 22:04:19,473 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=316269.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:04:26,925 INFO [finetune.py:992] (0/2) Epoch 18, batch 8300, loss[loss=0.1774, simple_loss=0.2707, pruned_loss=0.04208, over 11679.00 frames. ], tot_loss[loss=0.1636, simple_loss=0.2535, pruned_loss=0.03682, over 2369290.76 frames. ], batch size: 48, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 22:04:42,135 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.639e+02 2.605e+02 3.028e+02 3.553e+02 8.114e+02, threshold=6.056e+02, percent-clipped=2.0 2023-05-18 22:04:53,067 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.77 vs. limit=2.0 2023-05-18 22:05:02,690 INFO [finetune.py:992] (0/2) Epoch 18, batch 8350, loss[loss=0.147, simple_loss=0.2433, pruned_loss=0.02534, over 12166.00 frames. ], tot_loss[loss=0.1638, simple_loss=0.2536, pruned_loss=0.03704, over 2365363.27 frames. ], batch size: 36, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 22:05:14,584 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0806, 4.6315, 5.0879, 4.4087, 4.7460, 4.5365, 5.1071, 4.7385], device='cuda:0'), covar=tensor([0.0344, 0.0472, 0.0291, 0.0318, 0.0457, 0.0348, 0.0196, 0.0498], device='cuda:0'), in_proj_covar=tensor([0.0284, 0.0290, 0.0311, 0.0281, 0.0281, 0.0280, 0.0254, 0.0228], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 22:05:16,765 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.2724, 3.9724, 4.1270, 4.4136, 2.9391, 4.0288, 2.5690, 4.0090], device='cuda:0'), covar=tensor([0.1779, 0.0858, 0.0876, 0.0581, 0.1297, 0.0673, 0.2130, 0.1323], device='cuda:0'), in_proj_covar=tensor([0.0232, 0.0274, 0.0304, 0.0363, 0.0247, 0.0247, 0.0264, 0.0374], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-18 22:05:22,212 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=2.50 vs. limit=5.0 2023-05-18 22:05:30,338 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=316370.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:05:37,168 INFO [finetune.py:992] (0/2) Epoch 18, batch 8400, loss[loss=0.1759, simple_loss=0.2653, pruned_loss=0.04328, over 12348.00 frames. ], tot_loss[loss=0.1641, simple_loss=0.2542, pruned_loss=0.037, over 2371216.88 frames. ], batch size: 36, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 22:05:50,998 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.662e+02 2.560e+02 2.944e+02 3.523e+02 7.440e+02, threshold=5.888e+02, percent-clipped=2.0 2023-05-18 22:06:13,245 INFO [finetune.py:992] (0/2) Epoch 18, batch 8450, loss[loss=0.1804, simple_loss=0.2618, pruned_loss=0.04952, over 12196.00 frames. ], tot_loss[loss=0.1638, simple_loss=0.2537, pruned_loss=0.03699, over 2361200.45 frames. ], batch size: 35, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 22:06:14,765 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.0318, 2.4589, 3.5939, 2.9960, 3.4118, 3.1941, 2.5563, 3.5517], device='cuda:0'), covar=tensor([0.0149, 0.0398, 0.0174, 0.0276, 0.0164, 0.0204, 0.0375, 0.0132], device='cuda:0'), in_proj_covar=tensor([0.0193, 0.0216, 0.0206, 0.0201, 0.0235, 0.0181, 0.0208, 0.0204], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 22:06:32,018 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.2562, 4.5809, 2.7789, 2.6064, 3.9190, 2.6142, 3.8635, 3.1868], device='cuda:0'), covar=tensor([0.0804, 0.0519, 0.1345, 0.1673, 0.0296, 0.1369, 0.0541, 0.0902], device='cuda:0'), in_proj_covar=tensor([0.0193, 0.0266, 0.0180, 0.0203, 0.0146, 0.0187, 0.0205, 0.0178], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 22:06:48,142 INFO [finetune.py:992] (0/2) Epoch 18, batch 8500, loss[loss=0.1489, simple_loss=0.2407, pruned_loss=0.02853, over 12085.00 frames. ], tot_loss[loss=0.1633, simple_loss=0.2531, pruned_loss=0.03676, over 2367110.36 frames. ], batch size: 32, lr: 3.21e-03, grad_scale: 8.0 2023-05-18 22:07:01,941 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.872e+02 2.609e+02 3.108e+02 3.624e+02 7.556e+02, threshold=6.217e+02, percent-clipped=4.0 2023-05-18 22:07:13,015 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=316516.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:07:22,410 INFO [finetune.py:992] (0/2) Epoch 18, batch 8550, loss[loss=0.1398, simple_loss=0.2211, pruned_loss=0.0293, over 12354.00 frames. ], tot_loss[loss=0.1641, simple_loss=0.2541, pruned_loss=0.03708, over 2366705.24 frames. ], batch size: 30, lr: 3.21e-03, grad_scale: 8.0 2023-05-18 22:07:49,668 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=316569.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:07:55,861 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=316577.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:07:57,699 INFO [finetune.py:992] (0/2) Epoch 18, batch 8600, loss[loss=0.1757, simple_loss=0.2669, pruned_loss=0.04222, over 12071.00 frames. ], tot_loss[loss=0.1643, simple_loss=0.2542, pruned_loss=0.03725, over 2358574.91 frames. ], batch size: 42, lr: 3.21e-03, grad_scale: 8.0 2023-05-18 22:08:11,461 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.393e+02 2.785e+02 3.424e+02 7.085e+02, threshold=5.569e+02, percent-clipped=2.0 2023-05-18 22:08:22,302 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0345, 4.8744, 4.7635, 4.9121, 4.5108, 5.0427, 4.9811, 5.1542], device='cuda:0'), covar=tensor([0.0189, 0.0152, 0.0232, 0.0339, 0.0813, 0.0306, 0.0158, 0.0172], device='cuda:0'), in_proj_covar=tensor([0.0207, 0.0208, 0.0201, 0.0258, 0.0252, 0.0233, 0.0186, 0.0241], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-18 22:08:23,620 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=316617.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:08:32,584 INFO [finetune.py:992] (0/2) Epoch 18, batch 8650, loss[loss=0.1598, simple_loss=0.2537, pruned_loss=0.03293, over 12270.00 frames. ], tot_loss[loss=0.1635, simple_loss=0.2538, pruned_loss=0.03667, over 2366185.80 frames. ], batch size: 37, lr: 3.21e-03, grad_scale: 8.0 2023-05-18 22:08:47,084 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.32 vs. limit=2.0 2023-05-18 22:09:00,723 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=316670.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:09:07,616 INFO [finetune.py:992] (0/2) Epoch 18, batch 8700, loss[loss=0.1596, simple_loss=0.2529, pruned_loss=0.03311, over 12094.00 frames. ], tot_loss[loss=0.1639, simple_loss=0.2539, pruned_loss=0.03692, over 2358440.71 frames. ], batch size: 38, lr: 3.21e-03, grad_scale: 8.0 2023-05-18 22:09:16,445 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.66 vs. limit=2.0 2023-05-18 22:09:21,422 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.887e+02 2.739e+02 3.075e+02 3.684e+02 6.683e+02, threshold=6.151e+02, percent-clipped=3.0 2023-05-18 22:09:22,479 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.68 vs. limit=2.0 2023-05-18 22:09:29,467 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.91 vs. limit=2.0 2023-05-18 22:09:34,475 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=316718.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:09:34,846 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.89 vs. limit=2.0 2023-05-18 22:09:38,779 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.4494, 2.4691, 3.0967, 4.3158, 2.2610, 4.2154, 4.4172, 4.4258], device='cuda:0'), covar=tensor([0.0156, 0.1365, 0.0561, 0.0163, 0.1453, 0.0307, 0.0157, 0.0110], device='cuda:0'), in_proj_covar=tensor([0.0130, 0.0210, 0.0189, 0.0128, 0.0195, 0.0189, 0.0187, 0.0130], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:0') 2023-05-18 22:09:43,353 INFO [finetune.py:992] (0/2) Epoch 18, batch 8750, loss[loss=0.1752, simple_loss=0.2714, pruned_loss=0.03954, over 12034.00 frames. ], tot_loss[loss=0.1646, simple_loss=0.2548, pruned_loss=0.0372, over 2361645.06 frames. ], batch size: 42, lr: 3.21e-03, grad_scale: 8.0 2023-05-18 22:10:17,793 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2023-05-18 22:10:17,999 INFO [finetune.py:992] (0/2) Epoch 18, batch 8800, loss[loss=0.1666, simple_loss=0.2577, pruned_loss=0.0377, over 12122.00 frames. ], tot_loss[loss=0.1635, simple_loss=0.2536, pruned_loss=0.03671, over 2370040.50 frames. ], batch size: 38, lr: 3.21e-03, grad_scale: 8.0 2023-05-18 22:10:31,562 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.753e+02 2.658e+02 2.959e+02 3.676e+02 1.888e+03, threshold=5.918e+02, percent-clipped=2.0 2023-05-18 22:10:39,643 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.2030, 4.5070, 2.6666, 2.5611, 3.8458, 2.5992, 3.8384, 3.0173], device='cuda:0'), covar=tensor([0.0800, 0.0680, 0.1343, 0.1649, 0.0355, 0.1365, 0.0578, 0.0943], device='cuda:0'), in_proj_covar=tensor([0.0191, 0.0264, 0.0180, 0.0202, 0.0145, 0.0187, 0.0204, 0.0177], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 22:10:47,017 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.5908, 2.6157, 3.1557, 4.4477, 2.3743, 4.2411, 4.5219, 4.5741], device='cuda:0'), covar=tensor([0.0127, 0.1311, 0.0553, 0.0155, 0.1461, 0.0323, 0.0142, 0.0099], device='cuda:0'), in_proj_covar=tensor([0.0130, 0.0209, 0.0189, 0.0127, 0.0194, 0.0189, 0.0186, 0.0130], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:0') 2023-05-18 22:10:52,448 INFO [finetune.py:992] (0/2) Epoch 18, batch 8850, loss[loss=0.1748, simple_loss=0.273, pruned_loss=0.03832, over 12276.00 frames. ], tot_loss[loss=0.1633, simple_loss=0.2531, pruned_loss=0.03671, over 2369238.85 frames. ], batch size: 37, lr: 3.21e-03, grad_scale: 8.0 2023-05-18 22:11:22,787 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=316872.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:11:28,238 INFO [finetune.py:992] (0/2) Epoch 18, batch 8900, loss[loss=0.1645, simple_loss=0.2546, pruned_loss=0.03723, over 12345.00 frames. ], tot_loss[loss=0.1646, simple_loss=0.2544, pruned_loss=0.0374, over 2363516.83 frames. ], batch size: 36, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:11:33,439 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.26 vs. limit=2.0 2023-05-18 22:11:34,731 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.3999, 2.4743, 3.6619, 4.4489, 3.7456, 4.3497, 3.7489, 3.1694], device='cuda:0'), covar=tensor([0.0045, 0.0423, 0.0152, 0.0038, 0.0151, 0.0076, 0.0140, 0.0344], device='cuda:0'), in_proj_covar=tensor([0.0093, 0.0127, 0.0109, 0.0084, 0.0110, 0.0121, 0.0105, 0.0144], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 22:11:42,056 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.606e+02 2.717e+02 3.132e+02 3.823e+02 7.560e+02, threshold=6.264e+02, percent-clipped=1.0 2023-05-18 22:11:43,571 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=316902.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:12:02,469 INFO [finetune.py:992] (0/2) Epoch 18, batch 8950, loss[loss=0.1497, simple_loss=0.2459, pruned_loss=0.02672, over 12074.00 frames. ], tot_loss[loss=0.1647, simple_loss=0.2547, pruned_loss=0.03741, over 2367744.96 frames. ], batch size: 42, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:12:07,783 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.43 vs. limit=2.0 2023-05-18 22:12:08,688 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.9901, 4.8875, 4.7744, 4.8646, 4.2047, 4.9827, 4.9199, 5.1247], device='cuda:0'), covar=tensor([0.0300, 0.0177, 0.0230, 0.0401, 0.1190, 0.0412, 0.0206, 0.0241], device='cuda:0'), in_proj_covar=tensor([0.0208, 0.0208, 0.0201, 0.0258, 0.0251, 0.0233, 0.0186, 0.0241], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-18 22:12:12,716 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.5896, 2.7187, 3.6918, 4.5889, 3.8682, 4.5161, 3.8106, 3.2996], device='cuda:0'), covar=tensor([0.0041, 0.0414, 0.0145, 0.0040, 0.0131, 0.0088, 0.0150, 0.0370], device='cuda:0'), in_proj_covar=tensor([0.0093, 0.0127, 0.0109, 0.0084, 0.0110, 0.0121, 0.0105, 0.0144], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 22:12:23,689 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.3772, 4.7531, 2.9114, 2.7735, 4.1029, 2.7385, 4.0556, 3.2103], device='cuda:0'), covar=tensor([0.0817, 0.0546, 0.1311, 0.1542, 0.0283, 0.1325, 0.0558, 0.0883], device='cuda:0'), in_proj_covar=tensor([0.0192, 0.0264, 0.0181, 0.0203, 0.0145, 0.0187, 0.0204, 0.0177], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 22:12:25,105 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=316963.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:12:29,467 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=2.83 vs. limit=5.0 2023-05-18 22:12:36,657 INFO [finetune.py:992] (0/2) Epoch 18, batch 9000, loss[loss=0.1329, simple_loss=0.2251, pruned_loss=0.02035, over 12026.00 frames. ], tot_loss[loss=0.1635, simple_loss=0.2535, pruned_loss=0.0368, over 2377018.77 frames. ], batch size: 31, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:12:36,658 INFO [finetune.py:1017] (0/2) Computing validation loss 2023-05-18 22:12:48,354 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([1.8704, 3.3637, 3.5987, 4.0582, 2.8483, 3.5472, 2.5544, 3.4807], device='cuda:0'), covar=tensor([0.2102, 0.1170, 0.0997, 0.0525, 0.1359, 0.0881, 0.2072, 0.1159], device='cuda:0'), in_proj_covar=tensor([0.0233, 0.0275, 0.0305, 0.0364, 0.0248, 0.0249, 0.0265, 0.0374], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-18 22:12:54,883 INFO [finetune.py:1026] (0/2) Epoch 18, validation: loss=0.319, simple_loss=0.3929, pruned_loss=0.1225, over 1020973.00 frames. 2023-05-18 22:12:54,883 INFO [finetune.py:1027] (0/2) Maximum memory allocated so far is 12508MB 2023-05-18 22:13:02,188 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.7723, 2.9479, 4.6709, 4.7801, 2.8105, 2.5370, 2.9602, 2.1172], device='cuda:0'), covar=tensor([0.1710, 0.3075, 0.0458, 0.0427, 0.1450, 0.2659, 0.3024, 0.4365], device='cuda:0'), in_proj_covar=tensor([0.0312, 0.0397, 0.0286, 0.0313, 0.0283, 0.0327, 0.0411, 0.0388], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-18 22:13:08,548 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.831e+02 2.672e+02 3.168e+02 4.016e+02 6.334e+02, threshold=6.336e+02, percent-clipped=1.0 2023-05-18 22:13:11,872 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=317004.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:13:18,663 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.1539, 6.0240, 5.6311, 5.5770, 6.0790, 5.4021, 5.5454, 5.5454], device='cuda:0'), covar=tensor([0.1453, 0.0906, 0.1079, 0.1922, 0.0912, 0.2064, 0.1964, 0.1322], device='cuda:0'), in_proj_covar=tensor([0.0377, 0.0531, 0.0423, 0.0469, 0.0483, 0.0465, 0.0426, 0.0410], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 22:13:29,539 INFO [finetune.py:992] (0/2) Epoch 18, batch 9050, loss[loss=0.142, simple_loss=0.2311, pruned_loss=0.0264, over 12270.00 frames. ], tot_loss[loss=0.164, simple_loss=0.2537, pruned_loss=0.03709, over 2375627.22 frames. ], batch size: 32, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:13:53,784 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=317065.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:14:04,051 INFO [finetune.py:992] (0/2) Epoch 18, batch 9100, loss[loss=0.1791, simple_loss=0.2637, pruned_loss=0.04727, over 12366.00 frames. ], tot_loss[loss=0.1641, simple_loss=0.254, pruned_loss=0.03709, over 2373921.64 frames. ], batch size: 36, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:14:17,889 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.900e+02 2.637e+02 3.104e+02 3.914e+02 6.162e+02, threshold=6.208e+02, percent-clipped=0.0 2023-05-18 22:14:37,017 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.40 vs. limit=2.0 2023-05-18 22:14:40,024 INFO [finetune.py:992] (0/2) Epoch 18, batch 9150, loss[loss=0.1518, simple_loss=0.2502, pruned_loss=0.0267, over 11450.00 frames. ], tot_loss[loss=0.1647, simple_loss=0.2545, pruned_loss=0.03747, over 2360729.87 frames. ], batch size: 55, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:14:48,477 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.8343, 3.4468, 5.2112, 2.7190, 3.0092, 3.9058, 3.2877, 3.8796], device='cuda:0'), covar=tensor([0.0429, 0.1153, 0.0365, 0.1253, 0.1877, 0.1536, 0.1359, 0.1131], device='cuda:0'), in_proj_covar=tensor([0.0242, 0.0242, 0.0266, 0.0187, 0.0243, 0.0301, 0.0230, 0.0274], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 22:15:09,328 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=317172.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:15:15,084 INFO [finetune.py:992] (0/2) Epoch 18, batch 9200, loss[loss=0.1655, simple_loss=0.2614, pruned_loss=0.03482, over 12296.00 frames. ], tot_loss[loss=0.1647, simple_loss=0.2546, pruned_loss=0.03735, over 2361935.00 frames. ], batch size: 34, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:15:28,997 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.764e+02 2.558e+02 3.038e+02 3.693e+02 1.521e+03, threshold=6.076e+02, percent-clipped=5.0 2023-05-18 22:15:43,238 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=317220.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:15:49,940 INFO [finetune.py:992] (0/2) Epoch 18, batch 9250, loss[loss=0.1763, simple_loss=0.2618, pruned_loss=0.04544, over 12118.00 frames. ], tot_loss[loss=0.1643, simple_loss=0.2542, pruned_loss=0.03723, over 2360907.05 frames. ], batch size: 39, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:16:04,029 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=317250.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 22:16:10,549 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=317258.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:16:12,860 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.7009, 3.2816, 5.1684, 2.5347, 2.8638, 3.8135, 3.1861, 3.9002], device='cuda:0'), covar=tensor([0.0472, 0.1292, 0.0271, 0.1326, 0.1954, 0.1547, 0.1526, 0.1183], device='cuda:0'), in_proj_covar=tensor([0.0242, 0.0243, 0.0267, 0.0187, 0.0243, 0.0301, 0.0230, 0.0275], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 22:16:13,191 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.82 vs. limit=2.0 2023-05-18 22:16:25,730 INFO [finetune.py:992] (0/2) Epoch 18, batch 9300, loss[loss=0.1731, simple_loss=0.2636, pruned_loss=0.04128, over 12205.00 frames. ], tot_loss[loss=0.1644, simple_loss=0.254, pruned_loss=0.03738, over 2357703.90 frames. ], batch size: 35, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:16:38,151 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=317298.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 22:16:39,418 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.791e+02 2.630e+02 3.100e+02 3.705e+02 6.033e+02, threshold=6.201e+02, percent-clipped=0.0 2023-05-18 22:16:47,391 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=317311.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 22:17:02,254 INFO [finetune.py:992] (0/2) Epoch 18, batch 9350, loss[loss=0.1517, simple_loss=0.2541, pruned_loss=0.02468, over 12284.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.2533, pruned_loss=0.03672, over 2370479.29 frames. ], batch size: 37, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:17:21,651 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.0814, 3.9729, 4.1659, 4.4556, 3.0052, 3.9666, 2.7706, 4.2589], device='cuda:0'), covar=tensor([0.1674, 0.0766, 0.0784, 0.0596, 0.1161, 0.0625, 0.1751, 0.0978], device='cuda:0'), in_proj_covar=tensor([0.0231, 0.0272, 0.0302, 0.0362, 0.0247, 0.0247, 0.0263, 0.0372], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-18 22:17:22,324 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=317359.0, num_to_drop=1, layers_to_drop={0} 2023-05-18 22:17:22,823 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=317360.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:17:26,413 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0166, 4.6730, 4.7226, 4.8210, 4.6967, 4.8668, 4.7597, 2.5987], device='cuda:0'), covar=tensor([0.0109, 0.0067, 0.0106, 0.0071, 0.0056, 0.0111, 0.0087, 0.0896], device='cuda:0'), in_proj_covar=tensor([0.0074, 0.0083, 0.0088, 0.0078, 0.0064, 0.0098, 0.0086, 0.0103], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 22:17:36,596 INFO [finetune.py:992] (0/2) Epoch 18, batch 9400, loss[loss=0.1762, simple_loss=0.2627, pruned_loss=0.04483, over 12117.00 frames. ], tot_loss[loss=0.1638, simple_loss=0.2537, pruned_loss=0.03698, over 2367753.83 frames. ], batch size: 39, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:17:50,593 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.641e+02 3.089e+02 3.636e+02 5.893e+02, threshold=6.177e+02, percent-clipped=0.0 2023-05-18 22:18:06,888 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=2.93 vs. limit=5.0 2023-05-18 22:18:12,810 INFO [finetune.py:992] (0/2) Epoch 18, batch 9450, loss[loss=0.1758, simple_loss=0.2687, pruned_loss=0.04141, over 12016.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.2533, pruned_loss=0.03676, over 2371460.77 frames. ], batch size: 40, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:18:17,475 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=3.30 vs. limit=5.0 2023-05-18 22:18:27,031 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.3184, 4.9905, 5.1327, 5.0820, 5.0219, 5.2244, 5.1127, 2.8598], device='cuda:0'), covar=tensor([0.0109, 0.0061, 0.0063, 0.0062, 0.0047, 0.0099, 0.0081, 0.0745], device='cuda:0'), in_proj_covar=tensor([0.0074, 0.0083, 0.0088, 0.0078, 0.0064, 0.0099, 0.0086, 0.0103], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 22:18:47,719 INFO [finetune.py:992] (0/2) Epoch 18, batch 9500, loss[loss=0.1387, simple_loss=0.2312, pruned_loss=0.02314, over 12107.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2525, pruned_loss=0.03641, over 2371910.81 frames. ], batch size: 33, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:19:01,323 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.073e+02 2.574e+02 2.985e+02 3.737e+02 6.283e+02, threshold=5.970e+02, percent-clipped=1.0 2023-05-18 22:19:21,821 INFO [finetune.py:992] (0/2) Epoch 18, batch 9550, loss[loss=0.1805, simple_loss=0.2729, pruned_loss=0.04405, over 11620.00 frames. ], tot_loss[loss=0.1637, simple_loss=0.2536, pruned_loss=0.0369, over 2360256.41 frames. ], batch size: 48, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:19:22,620 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.2205, 6.0346, 5.5525, 5.5144, 6.1396, 5.3973, 5.5216, 5.5421], device='cuda:0'), covar=tensor([0.1438, 0.0901, 0.1012, 0.1895, 0.0847, 0.2332, 0.1796, 0.1295], device='cuda:0'), in_proj_covar=tensor([0.0373, 0.0526, 0.0420, 0.0466, 0.0477, 0.0460, 0.0420, 0.0406], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 22:19:36,629 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.89 vs. limit=2.0 2023-05-18 22:19:42,508 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=317558.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:19:57,721 INFO [finetune.py:992] (0/2) Epoch 18, batch 9600, loss[loss=0.155, simple_loss=0.2459, pruned_loss=0.03209, over 12265.00 frames. ], tot_loss[loss=0.163, simple_loss=0.253, pruned_loss=0.03648, over 2366591.89 frames. ], batch size: 32, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:20:05,094 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0919, 5.9434, 5.4973, 5.4324, 6.0662, 5.2526, 5.5576, 5.3820], device='cuda:0'), covar=tensor([0.1601, 0.0935, 0.1210, 0.1978, 0.0915, 0.2301, 0.1806, 0.1261], device='cuda:0'), in_proj_covar=tensor([0.0375, 0.0530, 0.0422, 0.0468, 0.0480, 0.0462, 0.0422, 0.0408], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 22:20:07,370 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.6507, 2.8092, 4.4777, 4.6793, 2.8589, 2.6170, 3.0268, 2.1708], device='cuda:0'), covar=tensor([0.1837, 0.3100, 0.0567, 0.0451, 0.1432, 0.2640, 0.2950, 0.4367], device='cuda:0'), in_proj_covar=tensor([0.0314, 0.0399, 0.0288, 0.0315, 0.0285, 0.0328, 0.0412, 0.0389], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-18 22:20:11,256 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.543e+02 3.144e+02 3.767e+02 6.808e+02, threshold=6.287e+02, percent-clipped=1.0 2023-05-18 22:20:15,783 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=317606.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:20:15,800 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=317606.0, num_to_drop=1, layers_to_drop={2} 2023-05-18 22:20:17,659 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2023-05-18 22:20:32,459 INFO [finetune.py:992] (0/2) Epoch 18, batch 9650, loss[loss=0.1499, simple_loss=0.2315, pruned_loss=0.03413, over 12347.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2516, pruned_loss=0.03634, over 2373872.42 frames. ], batch size: 30, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:20:49,071 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=317654.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 22:20:53,381 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=317660.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:21:05,086 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([6.1147, 6.0758, 5.8396, 5.3667, 5.2072, 6.0086, 5.6036, 5.3390], device='cuda:0'), covar=tensor([0.0689, 0.0884, 0.0593, 0.1653, 0.0767, 0.0749, 0.1539, 0.1097], device='cuda:0'), in_proj_covar=tensor([0.0662, 0.0590, 0.0537, 0.0667, 0.0443, 0.0765, 0.0818, 0.0592], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0004, 0.0004, 0.0003], device='cuda:0') 2023-05-18 22:21:07,087 INFO [finetune.py:992] (0/2) Epoch 18, batch 9700, loss[loss=0.1485, simple_loss=0.232, pruned_loss=0.03255, over 11795.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2525, pruned_loss=0.03663, over 2369992.50 frames. ], batch size: 26, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:21:20,752 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.740e+02 3.173e+02 3.841e+02 6.003e+02, threshold=6.347e+02, percent-clipped=0.0 2023-05-18 22:21:27,536 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=317708.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:21:38,825 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.2911, 4.1945, 4.1487, 4.5164, 3.0588, 4.2036, 2.7896, 4.3781], device='cuda:0'), covar=tensor([0.1590, 0.0669, 0.0952, 0.0684, 0.1186, 0.0543, 0.1733, 0.0931], device='cuda:0'), in_proj_covar=tensor([0.0231, 0.0271, 0.0303, 0.0362, 0.0247, 0.0246, 0.0264, 0.0372], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-18 22:21:42,767 INFO [finetune.py:992] (0/2) Epoch 18, batch 9750, loss[loss=0.165, simple_loss=0.2631, pruned_loss=0.0335, over 10446.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2531, pruned_loss=0.03639, over 2373764.51 frames. ], batch size: 68, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:22:11,836 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.86 vs. limit=5.0 2023-05-18 22:22:17,561 INFO [finetune.py:992] (0/2) Epoch 18, batch 9800, loss[loss=0.1581, simple_loss=0.2558, pruned_loss=0.03015, over 12192.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2528, pruned_loss=0.03631, over 2374566.70 frames. ], batch size: 35, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:22:23,253 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.5485, 3.2939, 4.9673, 2.6965, 2.7647, 3.8019, 3.0881, 3.8953], device='cuda:0'), covar=tensor([0.0437, 0.1230, 0.0327, 0.1177, 0.1991, 0.1382, 0.1467, 0.1150], device='cuda:0'), in_proj_covar=tensor([0.0242, 0.0243, 0.0268, 0.0188, 0.0243, 0.0300, 0.0231, 0.0274], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 22:22:25,931 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=317792.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:22:31,285 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.623e+02 3.208e+02 4.036e+02 5.837e+02, threshold=6.416e+02, percent-clipped=0.0 2023-05-18 22:22:52,261 INFO [finetune.py:992] (0/2) Epoch 18, batch 9850, loss[loss=0.1476, simple_loss=0.2286, pruned_loss=0.03327, over 12368.00 frames. ], tot_loss[loss=0.1633, simple_loss=0.2531, pruned_loss=0.03672, over 2367400.60 frames. ], batch size: 30, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:23:01,495 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.50 vs. limit=2.0 2023-05-18 22:23:09,341 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=317853.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:23:27,783 INFO [finetune.py:992] (0/2) Epoch 18, batch 9900, loss[loss=0.1848, simple_loss=0.2697, pruned_loss=0.04992, over 12366.00 frames. ], tot_loss[loss=0.1644, simple_loss=0.2543, pruned_loss=0.03721, over 2374720.24 frames. ], batch size: 38, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:23:30,677 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=317884.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:23:32,309 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=3.21 vs. limit=5.0 2023-05-18 22:23:39,732 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0234, 4.6646, 4.8150, 4.9137, 4.7751, 4.9754, 4.9247, 2.7009], device='cuda:0'), covar=tensor([0.0097, 0.0086, 0.0091, 0.0061, 0.0048, 0.0091, 0.0092, 0.0779], device='cuda:0'), in_proj_covar=tensor([0.0073, 0.0082, 0.0087, 0.0076, 0.0063, 0.0097, 0.0085, 0.0101], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 22:23:41,465 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.899e+02 2.738e+02 3.183e+02 3.821e+02 1.015e+03, threshold=6.366e+02, percent-clipped=1.0 2023-05-18 22:23:45,832 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=317906.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 22:24:02,195 INFO [finetune.py:992] (0/2) Epoch 18, batch 9950, loss[loss=0.1785, simple_loss=0.2657, pruned_loss=0.0457, over 12067.00 frames. ], tot_loss[loss=0.1647, simple_loss=0.2545, pruned_loss=0.03743, over 2365852.79 frames. ], batch size: 42, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:24:12,693 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=317945.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:24:18,958 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=317954.0, num_to_drop=1, layers_to_drop={0} 2023-05-18 22:24:18,990 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=317954.0, num_to_drop=1, layers_to_drop={2} 2023-05-18 22:24:36,617 INFO [finetune.py:992] (0/2) Epoch 18, batch 10000, loss[loss=0.2122, simple_loss=0.2947, pruned_loss=0.06483, over 8443.00 frames. ], tot_loss[loss=0.1656, simple_loss=0.255, pruned_loss=0.03806, over 2345852.43 frames. ], batch size: 98, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:24:41,282 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.8963, 3.6555, 3.7537, 3.8761, 3.6128, 3.9679, 3.9219, 4.0195], device='cuda:0'), covar=tensor([0.0282, 0.0254, 0.0191, 0.0422, 0.0573, 0.0560, 0.0190, 0.0247], device='cuda:0'), in_proj_covar=tensor([0.0208, 0.0210, 0.0202, 0.0259, 0.0253, 0.0234, 0.0187, 0.0242], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-18 22:24:51,149 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.059e+02 2.631e+02 3.060e+02 3.701e+02 7.144e+02, threshold=6.120e+02, percent-clipped=4.0 2023-05-18 22:24:51,489 INFO [checkpoint.py:75] (0/2) Saving checkpoint to pruned_transducer_stateless7/exp_giga_finetune/checkpoint-218000.pt 2023-05-18 22:24:55,594 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=318002.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 22:25:14,821 INFO [finetune.py:992] (0/2) Epoch 18, batch 10050, loss[loss=0.1408, simple_loss=0.2277, pruned_loss=0.02701, over 12268.00 frames. ], tot_loss[loss=0.1642, simple_loss=0.2537, pruned_loss=0.03735, over 2355813.13 frames. ], batch size: 32, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:25:21,784 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.1800, 4.0086, 4.1789, 4.4389, 3.1119, 4.0555, 2.8003, 4.1902], device='cuda:0'), covar=tensor([0.1648, 0.0814, 0.0817, 0.0639, 0.1113, 0.0615, 0.1771, 0.0925], device='cuda:0'), in_proj_covar=tensor([0.0230, 0.0269, 0.0299, 0.0361, 0.0245, 0.0246, 0.0262, 0.0370], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-18 22:25:49,342 INFO [finetune.py:992] (0/2) Epoch 18, batch 10100, loss[loss=0.1659, simple_loss=0.2639, pruned_loss=0.03394, over 11915.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.253, pruned_loss=0.03683, over 2363003.77 frames. ], batch size: 44, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:26:03,140 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.820e+02 2.661e+02 3.113e+02 3.876e+02 6.644e+02, threshold=6.226e+02, percent-clipped=1.0 2023-05-18 22:26:07,272 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.8998, 5.7614, 5.4305, 5.2706, 5.9018, 4.9968, 5.4281, 5.2883], device='cuda:0'), covar=tensor([0.1525, 0.0876, 0.0983, 0.1746, 0.0823, 0.2459, 0.1467, 0.1109], device='cuda:0'), in_proj_covar=tensor([0.0373, 0.0524, 0.0420, 0.0463, 0.0477, 0.0461, 0.0418, 0.0406], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 22:26:23,637 INFO [finetune.py:992] (0/2) Epoch 18, batch 10150, loss[loss=0.1605, simple_loss=0.2577, pruned_loss=0.03163, over 12060.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2528, pruned_loss=0.0366, over 2368380.93 frames. ], batch size: 37, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:26:37,325 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=318148.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:26:59,516 INFO [finetune.py:992] (0/2) Epoch 18, batch 10200, loss[loss=0.1684, simple_loss=0.2638, pruned_loss=0.03644, over 12166.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.2531, pruned_loss=0.03657, over 2363323.19 frames. ], batch size: 36, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:27:12,953 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.4629, 2.6289, 3.6433, 4.3778, 3.7970, 4.3956, 3.7654, 3.1335], device='cuda:0'), covar=tensor([0.0046, 0.0405, 0.0151, 0.0054, 0.0132, 0.0087, 0.0145, 0.0398], device='cuda:0'), in_proj_covar=tensor([0.0093, 0.0126, 0.0109, 0.0084, 0.0110, 0.0121, 0.0105, 0.0143], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 22:27:13,426 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.887e+02 2.573e+02 3.016e+02 3.552e+02 5.688e+02, threshold=6.032e+02, percent-clipped=0.0 2023-05-18 22:27:26,623 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.1953, 5.9990, 5.5643, 5.5075, 6.1329, 5.3862, 5.5300, 5.5178], device='cuda:0'), covar=tensor([0.1420, 0.0949, 0.1190, 0.1821, 0.0817, 0.2056, 0.1808, 0.1215], device='cuda:0'), in_proj_covar=tensor([0.0372, 0.0523, 0.0419, 0.0465, 0.0476, 0.0460, 0.0420, 0.0406], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 22:27:29,635 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.2386, 4.5612, 4.0781, 4.9219, 4.3362, 2.8680, 4.0941, 2.8323], device='cuda:0'), covar=tensor([0.0832, 0.0780, 0.1446, 0.0488, 0.1209, 0.1736, 0.1164, 0.3612], device='cuda:0'), in_proj_covar=tensor([0.0317, 0.0388, 0.0369, 0.0345, 0.0381, 0.0279, 0.0356, 0.0374], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 22:27:34,801 INFO [finetune.py:992] (0/2) Epoch 18, batch 10250, loss[loss=0.1859, simple_loss=0.2815, pruned_loss=0.04513, over 12101.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.2536, pruned_loss=0.03663, over 2372641.98 frames. ], batch size: 39, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:27:37,919 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.1530, 4.2831, 4.3863, 4.4043, 2.9710, 4.2051, 3.1469, 4.3577], device='cuda:0'), covar=tensor([0.1653, 0.0639, 0.0586, 0.0555, 0.1160, 0.0558, 0.1499, 0.1012], device='cuda:0'), in_proj_covar=tensor([0.0231, 0.0270, 0.0301, 0.0362, 0.0246, 0.0246, 0.0263, 0.0371], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-18 22:27:41,952 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=318240.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:28:09,269 INFO [finetune.py:992] (0/2) Epoch 18, batch 10300, loss[loss=0.1684, simple_loss=0.255, pruned_loss=0.04091, over 12111.00 frames. ], tot_loss[loss=0.1636, simple_loss=0.2538, pruned_loss=0.03676, over 2372035.92 frames. ], batch size: 33, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:28:24,174 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.654e+02 2.970e+02 3.644e+02 8.965e+02, threshold=5.940e+02, percent-clipped=0.0 2023-05-18 22:28:31,771 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=318311.0, num_to_drop=1, layers_to_drop={0} 2023-05-18 22:28:44,849 INFO [finetune.py:992] (0/2) Epoch 18, batch 10350, loss[loss=0.1675, simple_loss=0.2602, pruned_loss=0.03744, over 12123.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.2533, pruned_loss=0.03656, over 2374093.34 frames. ], batch size: 38, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:28:59,745 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=318352.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:29:04,528 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.5191, 2.5443, 3.7308, 4.4265, 3.8998, 4.4518, 3.7580, 3.1244], device='cuda:0'), covar=tensor([0.0046, 0.0432, 0.0149, 0.0062, 0.0132, 0.0088, 0.0153, 0.0413], device='cuda:0'), in_proj_covar=tensor([0.0093, 0.0126, 0.0108, 0.0084, 0.0109, 0.0121, 0.0105, 0.0143], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 22:29:10,171 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=318367.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:29:12,403 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.0581, 4.5478, 4.1247, 4.8956, 4.3492, 2.8307, 4.1140, 2.9871], device='cuda:0'), covar=tensor([0.0984, 0.0795, 0.1379, 0.0504, 0.1256, 0.1787, 0.1201, 0.3367], device='cuda:0'), in_proj_covar=tensor([0.0314, 0.0383, 0.0365, 0.0342, 0.0376, 0.0276, 0.0352, 0.0370], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 22:29:13,713 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=318372.0, num_to_drop=1, layers_to_drop={2} 2023-05-18 22:29:19,089 INFO [finetune.py:992] (0/2) Epoch 18, batch 10400, loss[loss=0.1379, simple_loss=0.2188, pruned_loss=0.0285, over 12011.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.2531, pruned_loss=0.03685, over 2370517.20 frames. ], batch size: 28, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:29:28,372 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.0707, 3.9225, 2.6368, 2.3204, 3.4906, 2.3756, 3.5326, 2.8638], device='cuda:0'), covar=tensor([0.0737, 0.0699, 0.1190, 0.1587, 0.0357, 0.1345, 0.0551, 0.0846], device='cuda:0'), in_proj_covar=tensor([0.0192, 0.0265, 0.0180, 0.0203, 0.0146, 0.0187, 0.0204, 0.0178], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 22:29:32,857 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.064e+02 2.765e+02 3.191e+02 3.806e+02 7.414e+02, threshold=6.382e+02, percent-clipped=4.0 2023-05-18 22:29:42,338 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=318413.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:29:53,315 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=318428.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:29:54,550 INFO [finetune.py:992] (0/2) Epoch 18, batch 10450, loss[loss=0.1562, simple_loss=0.2474, pruned_loss=0.03254, over 12103.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2525, pruned_loss=0.03638, over 2374760.98 frames. ], batch size: 32, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:30:07,882 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=318448.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:30:17,454 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.1899, 2.0121, 2.3871, 2.1962, 2.3287, 2.4475, 1.9574, 2.4084], device='cuda:0'), covar=tensor([0.0154, 0.0340, 0.0218, 0.0245, 0.0178, 0.0184, 0.0341, 0.0178], device='cuda:0'), in_proj_covar=tensor([0.0197, 0.0222, 0.0211, 0.0204, 0.0237, 0.0184, 0.0213, 0.0210], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 22:30:29,651 INFO [finetune.py:992] (0/2) Epoch 18, batch 10500, loss[loss=0.1867, simple_loss=0.2764, pruned_loss=0.04846, over 12125.00 frames. ], tot_loss[loss=0.1635, simple_loss=0.2532, pruned_loss=0.03691, over 2368839.96 frames. ], batch size: 38, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:30:40,863 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=318496.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:30:43,542 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.590e+02 3.071e+02 3.704e+02 6.715e+02, threshold=6.142e+02, percent-clipped=1.0 2023-05-18 22:31:04,499 INFO [finetune.py:992] (0/2) Epoch 18, batch 10550, loss[loss=0.1747, simple_loss=0.2662, pruned_loss=0.04163, over 11773.00 frames. ], tot_loss[loss=0.1633, simple_loss=0.2532, pruned_loss=0.03676, over 2361464.16 frames. ], batch size: 44, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:31:11,645 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=318540.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:31:22,818 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.8352, 3.6151, 3.7551, 3.8672, 3.2299, 3.9679, 3.9163, 3.9719], device='cuda:0'), covar=tensor([0.0275, 0.0239, 0.0205, 0.0373, 0.0902, 0.0443, 0.0230, 0.0294], device='cuda:0'), in_proj_covar=tensor([0.0208, 0.0210, 0.0201, 0.0259, 0.0252, 0.0233, 0.0187, 0.0243], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-18 22:31:30,824 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.42 vs. limit=2.0 2023-05-18 22:31:40,723 INFO [finetune.py:992] (0/2) Epoch 18, batch 10600, loss[loss=0.1566, simple_loss=0.2495, pruned_loss=0.0318, over 12361.00 frames. ], tot_loss[loss=0.1638, simple_loss=0.2534, pruned_loss=0.03704, over 2342494.61 frames. ], batch size: 35, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:31:46,283 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=318588.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:31:54,441 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 2.614e+02 2.999e+02 3.518e+02 7.847e+02, threshold=5.998e+02, percent-clipped=4.0 2023-05-18 22:31:59,866 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.55 vs. limit=2.0 2023-05-18 22:32:16,147 INFO [finetune.py:992] (0/2) Epoch 18, batch 10650, loss[loss=0.1931, simple_loss=0.2731, pruned_loss=0.05652, over 12373.00 frames. ], tot_loss[loss=0.1642, simple_loss=0.2542, pruned_loss=0.03708, over 2344377.84 frames. ], batch size: 38, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:32:35,410 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.5412, 5.3551, 5.4909, 5.5370, 5.1585, 5.2172, 4.9569, 5.4657], device='cuda:0'), covar=tensor([0.0720, 0.0651, 0.0885, 0.0587, 0.1862, 0.1395, 0.0589, 0.1099], device='cuda:0'), in_proj_covar=tensor([0.0566, 0.0748, 0.0650, 0.0659, 0.0891, 0.0780, 0.0594, 0.0510], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0003, 0.0003], device='cuda:0') 2023-05-18 22:32:41,495 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=318667.0, num_to_drop=1, layers_to_drop={2} 2023-05-18 22:32:50,187 INFO [finetune.py:992] (0/2) Epoch 18, batch 10700, loss[loss=0.1675, simple_loss=0.2574, pruned_loss=0.03883, over 12248.00 frames. ], tot_loss[loss=0.1645, simple_loss=0.2544, pruned_loss=0.03732, over 2338456.71 frames. ], batch size: 32, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:33:04,731 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.903e+02 2.617e+02 3.214e+02 3.905e+02 7.925e+02, threshold=6.428e+02, percent-clipped=1.0 2023-05-18 22:33:10,382 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=318708.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:33:21,149 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=318723.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:33:25,813 INFO [finetune.py:992] (0/2) Epoch 18, batch 10750, loss[loss=0.15, simple_loss=0.2432, pruned_loss=0.02836, over 12134.00 frames. ], tot_loss[loss=0.1641, simple_loss=0.254, pruned_loss=0.0371, over 2347562.34 frames. ], batch size: 34, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:33:27,364 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=318732.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:34:00,314 INFO [finetune.py:992] (0/2) Epoch 18, batch 10800, loss[loss=0.1616, simple_loss=0.2484, pruned_loss=0.03746, over 12193.00 frames. ], tot_loss[loss=0.1635, simple_loss=0.2539, pruned_loss=0.03654, over 2357604.77 frames. ], batch size: 35, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:34:09,727 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=318793.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:34:14,288 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.967e+02 2.748e+02 3.175e+02 3.838e+02 7.308e+02, threshold=6.350e+02, percent-clipped=1.0 2023-05-18 22:34:35,449 INFO [finetune.py:992] (0/2) Epoch 18, batch 10850, loss[loss=0.2433, simple_loss=0.3157, pruned_loss=0.08544, over 8180.00 frames. ], tot_loss[loss=0.1636, simple_loss=0.2538, pruned_loss=0.0367, over 2356079.88 frames. ], batch size: 98, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:35:12,108 INFO [finetune.py:992] (0/2) Epoch 18, batch 10900, loss[loss=0.1535, simple_loss=0.2461, pruned_loss=0.03043, over 12022.00 frames. ], tot_loss[loss=0.1642, simple_loss=0.2544, pruned_loss=0.03705, over 2347904.73 frames. ], batch size: 31, lr: 3.20e-03, grad_scale: 32.0 2023-05-18 22:35:18,448 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.3230, 2.5613, 3.6173, 4.2805, 3.8112, 4.2942, 3.7861, 3.1150], device='cuda:0'), covar=tensor([0.0050, 0.0391, 0.0147, 0.0059, 0.0126, 0.0090, 0.0153, 0.0401], device='cuda:0'), in_proj_covar=tensor([0.0093, 0.0126, 0.0108, 0.0085, 0.0110, 0.0121, 0.0106, 0.0144], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 22:35:25,757 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.796e+02 2.603e+02 3.099e+02 4.010e+02 6.423e+02, threshold=6.198e+02, percent-clipped=1.0 2023-05-18 22:35:46,288 INFO [finetune.py:992] (0/2) Epoch 18, batch 10950, loss[loss=0.1567, simple_loss=0.2462, pruned_loss=0.03356, over 12089.00 frames. ], tot_loss[loss=0.1648, simple_loss=0.2552, pruned_loss=0.03727, over 2352627.72 frames. ], batch size: 33, lr: 3.20e-03, grad_scale: 32.0 2023-05-18 22:36:11,758 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=318967.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 22:36:20,713 INFO [finetune.py:992] (0/2) Epoch 18, batch 11000, loss[loss=0.1442, simple_loss=0.2296, pruned_loss=0.0294, over 12406.00 frames. ], tot_loss[loss=0.1658, simple_loss=0.2559, pruned_loss=0.03783, over 2340360.09 frames. ], batch size: 32, lr: 3.20e-03, grad_scale: 32.0 2023-05-18 22:36:32,533 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.12 vs. limit=2.0 2023-05-18 22:36:35,486 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.719e+02 3.351e+02 4.549e+02 6.103e+02, threshold=6.702e+02, percent-clipped=0.0 2023-05-18 22:36:41,517 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=319008.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:36:42,139 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.4529, 5.2795, 5.4061, 5.4341, 5.0380, 5.1209, 4.8753, 5.3817], device='cuda:0'), covar=tensor([0.0768, 0.0654, 0.0819, 0.0681, 0.2157, 0.1429, 0.0641, 0.1142], device='cuda:0'), in_proj_covar=tensor([0.0569, 0.0750, 0.0650, 0.0661, 0.0894, 0.0782, 0.0595, 0.0513], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:0') 2023-05-18 22:36:46,102 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=319015.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 22:36:51,608 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=319023.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:36:56,337 INFO [finetune.py:992] (0/2) Epoch 18, batch 11050, loss[loss=0.1675, simple_loss=0.2714, pruned_loss=0.03175, over 12146.00 frames. ], tot_loss[loss=0.1689, simple_loss=0.2589, pruned_loss=0.03943, over 2315117.02 frames. ], batch size: 34, lr: 3.20e-03, grad_scale: 32.0 2023-05-18 22:37:10,231 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.8421, 3.6236, 3.7215, 3.8064, 3.5524, 3.8949, 3.9084, 3.9475], device='cuda:0'), covar=tensor([0.0205, 0.0210, 0.0176, 0.0462, 0.0469, 0.0436, 0.0190, 0.0235], device='cuda:0'), in_proj_covar=tensor([0.0207, 0.0209, 0.0200, 0.0257, 0.0250, 0.0232, 0.0186, 0.0241], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-18 22:37:13,985 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=319056.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:37:24,072 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=319071.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:37:27,461 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=319076.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:37:30,555 INFO [finetune.py:992] (0/2) Epoch 18, batch 11100, loss[loss=0.1903, simple_loss=0.2751, pruned_loss=0.05277, over 12142.00 frames. ], tot_loss[loss=0.1729, simple_loss=0.2625, pruned_loss=0.04164, over 2268458.25 frames. ], batch size: 38, lr: 3.20e-03, grad_scale: 32.0 2023-05-18 22:37:36,085 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=319088.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:37:36,152 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.4554, 5.2896, 5.3634, 5.4226, 5.0650, 5.1469, 4.8778, 5.3474], device='cuda:0'), covar=tensor([0.0763, 0.0637, 0.0947, 0.0650, 0.2032, 0.1443, 0.0638, 0.1164], device='cuda:0'), in_proj_covar=tensor([0.0568, 0.0747, 0.0649, 0.0659, 0.0890, 0.0778, 0.0594, 0.0512], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0003, 0.0003], device='cuda:0') 2023-05-18 22:37:44,872 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.279e+02 3.031e+02 3.506e+02 4.333e+02 9.817e+02, threshold=7.011e+02, percent-clipped=7.0 2023-05-18 22:37:51,909 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.23 vs. limit=2.0 2023-05-18 22:38:04,822 INFO [finetune.py:992] (0/2) Epoch 18, batch 11150, loss[loss=0.3386, simple_loss=0.3887, pruned_loss=0.1442, over 6403.00 frames. ], tot_loss[loss=0.1787, simple_loss=0.2672, pruned_loss=0.04506, over 2203565.19 frames. ], batch size: 98, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:38:10,032 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=319137.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:38:24,297 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=319158.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 22:38:40,210 INFO [finetune.py:992] (0/2) Epoch 18, batch 11200, loss[loss=0.3059, simple_loss=0.3596, pruned_loss=0.1261, over 6726.00 frames. ], tot_loss[loss=0.187, simple_loss=0.2746, pruned_loss=0.04968, over 2127182.36 frames. ], batch size: 98, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:38:54,758 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.127e+02 3.242e+02 4.075e+02 4.801e+02 8.019e+02, threshold=8.151e+02, percent-clipped=3.0 2023-05-18 22:39:07,033 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=319219.0, num_to_drop=1, layers_to_drop={2} 2023-05-18 22:39:14,143 INFO [finetune.py:992] (0/2) Epoch 18, batch 11250, loss[loss=0.2956, simple_loss=0.3501, pruned_loss=0.1206, over 6984.00 frames. ], tot_loss[loss=0.193, simple_loss=0.2806, pruned_loss=0.05271, over 2094307.59 frames. ], batch size: 98, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:39:43,411 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.7940, 4.1646, 3.7700, 4.5643, 3.8780, 2.7579, 3.8215, 2.9613], device='cuda:0'), covar=tensor([0.1057, 0.0900, 0.1534, 0.0465, 0.1661, 0.1962, 0.1252, 0.3573], device='cuda:0'), in_proj_covar=tensor([0.0309, 0.0377, 0.0359, 0.0335, 0.0370, 0.0274, 0.0347, 0.0365], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 22:39:48,363 INFO [finetune.py:992] (0/2) Epoch 18, batch 11300, loss[loss=0.2237, simple_loss=0.3067, pruned_loss=0.07039, over 7143.00 frames. ], tot_loss[loss=0.2008, simple_loss=0.2874, pruned_loss=0.05712, over 2029916.37 frames. ], batch size: 99, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:40:02,533 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.321e+02 3.438e+02 4.108e+02 5.089e+02 1.496e+03, threshold=8.216e+02, percent-clipped=3.0 2023-05-18 22:40:22,910 INFO [finetune.py:992] (0/2) Epoch 18, batch 11350, loss[loss=0.2701, simple_loss=0.3385, pruned_loss=0.1009, over 6612.00 frames. ], tot_loss[loss=0.2066, simple_loss=0.2923, pruned_loss=0.06042, over 1972120.57 frames. ], batch size: 98, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:40:55,874 INFO [finetune.py:992] (0/2) Epoch 18, batch 11400, loss[loss=0.2485, simple_loss=0.3191, pruned_loss=0.08901, over 7138.00 frames. ], tot_loss[loss=0.2104, simple_loss=0.2957, pruned_loss=0.06254, over 1952586.96 frames. ], batch size: 99, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:40:57,413 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.8396, 2.1739, 3.6135, 3.5011, 3.7184, 3.7771, 3.7526, 2.8161], device='cuda:0'), covar=tensor([0.0078, 0.0656, 0.0153, 0.0156, 0.0117, 0.0134, 0.0122, 0.0515], device='cuda:0'), in_proj_covar=tensor([0.0092, 0.0125, 0.0106, 0.0083, 0.0108, 0.0120, 0.0104, 0.0142], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 22:41:01,876 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=319388.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:41:10,630 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.040e+02 3.596e+02 4.227e+02 4.901e+02 9.608e+02, threshold=8.454e+02, percent-clipped=1.0 2023-05-18 22:41:29,392 INFO [finetune.py:992] (0/2) Epoch 18, batch 11450, loss[loss=0.2705, simple_loss=0.3351, pruned_loss=0.103, over 6795.00 frames. ], tot_loss[loss=0.2139, simple_loss=0.2982, pruned_loss=0.06475, over 1913743.60 frames. ], batch size: 99, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:41:30,883 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=319432.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:41:33,438 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=319436.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:41:46,158 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.97 vs. limit=2.0 2023-05-18 22:42:03,054 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.7881, 2.7758, 4.2009, 4.3993, 3.0404, 2.8180, 2.9866, 2.1661], device='cuda:0'), covar=tensor([0.1543, 0.2889, 0.0452, 0.0391, 0.1131, 0.2246, 0.2665, 0.4200], device='cuda:0'), in_proj_covar=tensor([0.0310, 0.0393, 0.0281, 0.0307, 0.0279, 0.0322, 0.0407, 0.0382], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-18 22:42:03,442 INFO [finetune.py:992] (0/2) Epoch 18, batch 11500, loss[loss=0.239, simple_loss=0.3209, pruned_loss=0.07853, over 11233.00 frames. ], tot_loss[loss=0.2179, simple_loss=0.3012, pruned_loss=0.06732, over 1856497.25 frames. ], batch size: 55, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:42:17,238 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.452e+02 3.355e+02 4.114e+02 5.171e+02 1.226e+03, threshold=8.229e+02, percent-clipped=1.0 2023-05-18 22:42:18,071 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.6740, 3.7715, 3.7632, 3.8545, 3.6838, 3.6797, 3.5630, 3.7832], device='cuda:0'), covar=tensor([0.1580, 0.0707, 0.1466, 0.0836, 0.1508, 0.1291, 0.0664, 0.0971], device='cuda:0'), in_proj_covar=tensor([0.0551, 0.0717, 0.0629, 0.0633, 0.0854, 0.0753, 0.0574, 0.0493], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0003, 0.0003], device='cuda:0') 2023-05-18 22:42:27,214 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=319514.0, num_to_drop=1, layers_to_drop={2} 2023-05-18 22:42:37,503 INFO [finetune.py:992] (0/2) Epoch 18, batch 11550, loss[loss=0.2959, simple_loss=0.3515, pruned_loss=0.1201, over 6818.00 frames. ], tot_loss[loss=0.221, simple_loss=0.3033, pruned_loss=0.06932, over 1831407.99 frames. ], batch size: 98, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:42:51,528 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.15 vs. limit=2.0 2023-05-18 22:43:11,014 INFO [finetune.py:992] (0/2) Epoch 18, batch 11600, loss[loss=0.2333, simple_loss=0.3064, pruned_loss=0.08007, over 7164.00 frames. ], tot_loss[loss=0.2225, simple_loss=0.3044, pruned_loss=0.07028, over 1819003.06 frames. ], batch size: 98, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:43:21,652 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.2273, 1.9902, 2.1662, 2.1296, 2.2143, 2.3802, 1.9037, 2.2824], device='cuda:0'), covar=tensor([0.0099, 0.0299, 0.0116, 0.0198, 0.0151, 0.0154, 0.0289, 0.0147], device='cuda:0'), in_proj_covar=tensor([0.0186, 0.0210, 0.0197, 0.0192, 0.0223, 0.0173, 0.0201, 0.0199], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 22:43:25,089 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.352e+02 3.315e+02 3.958e+02 4.558e+02 7.057e+02, threshold=7.916e+02, percent-clipped=0.0 2023-05-18 22:43:46,172 INFO [finetune.py:992] (0/2) Epoch 18, batch 11650, loss[loss=0.2256, simple_loss=0.3032, pruned_loss=0.07403, over 7056.00 frames. ], tot_loss[loss=0.2222, simple_loss=0.3035, pruned_loss=0.07045, over 1803575.97 frames. ], batch size: 98, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:44:19,767 INFO [finetune.py:992] (0/2) Epoch 18, batch 11700, loss[loss=0.1904, simple_loss=0.2817, pruned_loss=0.04955, over 10250.00 frames. ], tot_loss[loss=0.2216, simple_loss=0.3025, pruned_loss=0.07032, over 1778755.87 frames. ], batch size: 68, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:44:34,857 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.533e+02 3.366e+02 3.819e+02 4.451e+02 1.022e+03, threshold=7.638e+02, percent-clipped=0.0 2023-05-18 22:44:41,144 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-05-18 22:44:53,812 INFO [finetune.py:992] (0/2) Epoch 18, batch 11750, loss[loss=0.2526, simple_loss=0.3176, pruned_loss=0.09384, over 6911.00 frames. ], tot_loss[loss=0.2226, simple_loss=0.3028, pruned_loss=0.07123, over 1744341.65 frames. ], batch size: 99, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:44:55,312 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=319732.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:45:15,643 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.3163, 3.0251, 3.1072, 3.3173, 2.6015, 3.0790, 2.5991, 2.6393], device='cuda:0'), covar=tensor([0.1535, 0.0899, 0.0787, 0.0484, 0.1035, 0.0833, 0.1603, 0.0502], device='cuda:0'), in_proj_covar=tensor([0.0229, 0.0267, 0.0296, 0.0355, 0.0243, 0.0243, 0.0261, 0.0365], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-18 22:45:16,967 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=319763.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:45:28,112 INFO [finetune.py:992] (0/2) Epoch 18, batch 11800, loss[loss=0.2194, simple_loss=0.3036, pruned_loss=0.06756, over 12097.00 frames. ], tot_loss[loss=0.2257, simple_loss=0.3049, pruned_loss=0.07324, over 1715055.38 frames. ], batch size: 39, lr: 3.19e-03, grad_scale: 16.0 2023-05-18 22:45:28,202 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=319780.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:45:42,172 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.295e+02 3.502e+02 3.958e+02 4.763e+02 8.177e+02, threshold=7.917e+02, percent-clipped=2.0 2023-05-18 22:45:51,487 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=319814.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 22:45:58,613 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=319824.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:46:02,329 INFO [finetune.py:992] (0/2) Epoch 18, batch 11850, loss[loss=0.2629, simple_loss=0.3261, pruned_loss=0.09982, over 7188.00 frames. ], tot_loss[loss=0.2292, simple_loss=0.3078, pruned_loss=0.07524, over 1676106.80 frames. ], batch size: 98, lr: 3.19e-03, grad_scale: 16.0 2023-05-18 22:46:06,085 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.32 vs. limit=2.0 2023-05-18 22:46:11,866 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=319844.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:46:23,502 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=319862.0, num_to_drop=1, layers_to_drop={0} 2023-05-18 22:46:36,335 INFO [finetune.py:992] (0/2) Epoch 18, batch 11900, loss[loss=0.1938, simple_loss=0.2902, pruned_loss=0.04871, over 10140.00 frames. ], tot_loss[loss=0.2265, simple_loss=0.3065, pruned_loss=0.07328, over 1679173.64 frames. ], batch size: 68, lr: 3.19e-03, grad_scale: 16.0 2023-05-18 22:46:49,991 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.213e+02 3.218e+02 3.820e+02 4.516e+02 7.088e+02, threshold=7.640e+02, percent-clipped=0.0 2023-05-18 22:46:52,761 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=319905.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:47:07,707 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.8718, 4.5334, 4.1133, 4.2897, 4.6051, 3.9231, 4.2129, 4.0553], device='cuda:0'), covar=tensor([0.1672, 0.1055, 0.1388, 0.1672, 0.0934, 0.2385, 0.1685, 0.1477], device='cuda:0'), in_proj_covar=tensor([0.0356, 0.0500, 0.0401, 0.0440, 0.0455, 0.0434, 0.0395, 0.0385], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 22:47:08,931 INFO [finetune.py:992] (0/2) Epoch 18, batch 11950, loss[loss=0.2078, simple_loss=0.2927, pruned_loss=0.06147, over 11115.00 frames. ], tot_loss[loss=0.2232, simple_loss=0.304, pruned_loss=0.07116, over 1678320.10 frames. ], batch size: 55, lr: 3.19e-03, grad_scale: 16.0 2023-05-18 22:47:43,128 INFO [finetune.py:992] (0/2) Epoch 18, batch 12000, loss[loss=0.1978, simple_loss=0.2815, pruned_loss=0.05702, over 7200.00 frames. ], tot_loss[loss=0.2163, simple_loss=0.299, pruned_loss=0.06681, over 1692991.46 frames. ], batch size: 98, lr: 3.19e-03, grad_scale: 16.0 2023-05-18 22:47:43,129 INFO [finetune.py:1017] (0/2) Computing validation loss 2023-05-18 22:47:53,385 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.4860, 5.4909, 5.3458, 4.7730, 4.9946, 5.5335, 5.0302, 5.1643], device='cuda:0'), covar=tensor([0.0804, 0.0962, 0.0518, 0.1911, 0.0510, 0.0489, 0.1303, 0.0763], device='cuda:0'), in_proj_covar=tensor([0.0631, 0.0566, 0.0513, 0.0632, 0.0424, 0.0723, 0.0768, 0.0564], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002], device='cuda:0') 2023-05-18 22:48:01,395 INFO [finetune.py:1026] (0/2) Epoch 18, validation: loss=0.2892, simple_loss=0.3621, pruned_loss=0.1082, over 1020973.00 frames. 2023-05-18 22:48:01,396 INFO [finetune.py:1027] (0/2) Maximum memory allocated so far is 12516MB 2023-05-18 22:48:14,852 INFO [checkpoint.py:75] (0/2) Saving checkpoint to pruned_transducer_stateless7/exp_giga_finetune/checkpoint-220000.pt 2023-05-18 22:48:18,104 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.962e+02 2.845e+02 3.398e+02 3.978e+02 8.382e+02, threshold=6.795e+02, percent-clipped=2.0 2023-05-18 22:48:37,117 INFO [finetune.py:992] (0/2) Epoch 18, batch 12050, loss[loss=0.1957, simple_loss=0.2862, pruned_loss=0.05262, over 10363.00 frames. ], tot_loss[loss=0.2111, simple_loss=0.295, pruned_loss=0.06362, over 1695326.95 frames. ], batch size: 69, lr: 3.19e-03, grad_scale: 16.0 2023-05-18 22:48:41,999 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.0204, 3.8946, 2.5778, 2.2887, 3.4805, 2.4258, 3.5832, 2.8079], device='cuda:0'), covar=tensor([0.0754, 0.0504, 0.1273, 0.1803, 0.0272, 0.1455, 0.0444, 0.0968], device='cuda:0'), in_proj_covar=tensor([0.0185, 0.0250, 0.0174, 0.0197, 0.0140, 0.0183, 0.0194, 0.0173], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 22:48:57,530 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=320060.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:49:09,766 INFO [finetune.py:992] (0/2) Epoch 18, batch 12100, loss[loss=0.2131, simple_loss=0.2887, pruned_loss=0.06872, over 6992.00 frames. ], tot_loss[loss=0.2096, simple_loss=0.2936, pruned_loss=0.06281, over 1690632.90 frames. ], batch size: 98, lr: 3.19e-03, grad_scale: 16.0 2023-05-18 22:49:21,281 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.7335, 3.5797, 3.5293, 3.6931, 3.7283, 3.7402, 3.6887, 2.6865], device='cuda:0'), covar=tensor([0.0102, 0.0107, 0.0161, 0.0094, 0.0066, 0.0141, 0.0103, 0.0865], device='cuda:0'), in_proj_covar=tensor([0.0071, 0.0080, 0.0085, 0.0074, 0.0061, 0.0094, 0.0082, 0.0099], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 22:49:22,979 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.061e+02 2.905e+02 3.486e+02 4.017e+02 7.717e+02, threshold=6.973e+02, percent-clipped=2.0 2023-05-18 22:49:31,887 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.99 vs. limit=2.0 2023-05-18 22:49:34,071 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=320119.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:49:34,334 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.38 vs. limit=5.0 2023-05-18 22:49:35,397 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=320121.0, num_to_drop=1, layers_to_drop={2} 2023-05-18 22:49:39,953 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.40 vs. limit=5.0 2023-05-18 22:49:40,769 INFO [finetune.py:992] (0/2) Epoch 18, batch 12150, loss[loss=0.2417, simple_loss=0.3123, pruned_loss=0.08555, over 7342.00 frames. ], tot_loss[loss=0.2111, simple_loss=0.2949, pruned_loss=0.06368, over 1678319.50 frames. ], batch size: 98, lr: 3.19e-03, grad_scale: 16.0 2023-05-18 22:50:01,581 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=320163.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:50:11,714 INFO [finetune.py:992] (0/2) Epoch 18, batch 12200, loss[loss=0.2267, simple_loss=0.3057, pruned_loss=0.07387, over 6742.00 frames. ], tot_loss[loss=0.2126, simple_loss=0.2959, pruned_loss=0.06464, over 1660072.47 frames. ], batch size: 99, lr: 3.19e-03, grad_scale: 16.0 2023-05-18 22:50:16,678 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.9157, 3.8132, 3.9122, 3.6442, 3.8036, 3.6679, 3.9014, 3.5563], device='cuda:0'), covar=tensor([0.0493, 0.0422, 0.0384, 0.0317, 0.0466, 0.0361, 0.0363, 0.1419], device='cuda:0'), in_proj_covar=tensor([0.0265, 0.0267, 0.0287, 0.0263, 0.0262, 0.0260, 0.0238, 0.0212], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 22:50:24,565 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=320200.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:50:25,057 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.291e+02 3.335e+02 3.821e+02 4.545e+02 1.101e+03, threshold=7.643e+02, percent-clipped=3.0 2023-05-18 22:50:34,082 INFO [checkpoint.py:75] (0/2) Saving checkpoint to pruned_transducer_stateless7/exp_giga_finetune/epoch-18.pt 2023-05-18 22:50:52,878 INFO [finetune.py:992] (0/2) Epoch 19, batch 0, loss[loss=0.1998, simple_loss=0.2933, pruned_loss=0.05309, over 12064.00 frames. ], tot_loss[loss=0.1998, simple_loss=0.2933, pruned_loss=0.05309, over 12064.00 frames. ], batch size: 42, lr: 3.19e-03, grad_scale: 16.0 2023-05-18 22:50:52,879 INFO [finetune.py:1017] (0/2) Computing validation loss 2023-05-18 22:51:09,217 INFO [finetune.py:1026] (0/2) Epoch 19, validation: loss=0.2843, simple_loss=0.3593, pruned_loss=0.1047, over 1020973.00 frames. 2023-05-18 22:51:09,218 INFO [finetune.py:1027] (0/2) Maximum memory allocated so far is 12525MB 2023-05-18 22:51:16,395 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=320224.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:51:41,895 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=320261.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 22:51:44,528 INFO [finetune.py:992] (0/2) Epoch 19, batch 50, loss[loss=0.1775, simple_loss=0.2763, pruned_loss=0.03935, over 12275.00 frames. ], tot_loss[loss=0.1702, simple_loss=0.2614, pruned_loss=0.03944, over 547403.94 frames. ], batch size: 37, lr: 3.19e-03, grad_scale: 16.0 2023-05-18 22:51:44,906 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.16 vs. limit=2.0 2023-05-18 22:52:11,003 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.801e+02 2.864e+02 3.286e+02 3.794e+02 9.424e+02, threshold=6.572e+02, percent-clipped=1.0 2023-05-18 22:52:19,940 INFO [finetune.py:992] (0/2) Epoch 19, batch 100, loss[loss=0.1318, simple_loss=0.2177, pruned_loss=0.02297, over 12264.00 frames. ], tot_loss[loss=0.1664, simple_loss=0.2569, pruned_loss=0.03791, over 954990.61 frames. ], batch size: 28, lr: 3.19e-03, grad_scale: 16.0 2023-05-18 22:52:25,575 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=320322.0, num_to_drop=1, layers_to_drop={3} 2023-05-18 22:52:54,373 INFO [finetune.py:992] (0/2) Epoch 19, batch 150, loss[loss=0.1889, simple_loss=0.28, pruned_loss=0.04886, over 12101.00 frames. ], tot_loss[loss=0.1658, simple_loss=0.2566, pruned_loss=0.03746, over 1279076.29 frames. ], batch size: 39, lr: 3.19e-03, grad_scale: 16.0 2023-05-18 22:53:06,658 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.52 vs. limit=2.0 2023-05-18 22:53:09,633 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.1188, 4.7569, 5.1154, 4.3841, 4.7430, 4.4730, 5.1452, 4.9281], device='cuda:0'), covar=tensor([0.0417, 0.0573, 0.0448, 0.0370, 0.0498, 0.0432, 0.0368, 0.0383], device='cuda:0'), in_proj_covar=tensor([0.0267, 0.0269, 0.0290, 0.0266, 0.0264, 0.0263, 0.0241, 0.0215], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 22:53:19,991 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.659e+02 2.635e+02 3.023e+02 3.573e+02 9.596e+02, threshold=6.047e+02, percent-clipped=2.0 2023-05-18 22:53:22,364 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.3910, 4.7699, 3.1603, 2.8648, 4.0675, 2.6704, 4.0013, 3.3707], device='cuda:0'), covar=tensor([0.0719, 0.0638, 0.1057, 0.1608, 0.0315, 0.1446, 0.0572, 0.0882], device='cuda:0'), in_proj_covar=tensor([0.0185, 0.0252, 0.0175, 0.0199, 0.0140, 0.0185, 0.0195, 0.0174], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 22:53:29,038 INFO [finetune.py:992] (0/2) Epoch 19, batch 200, loss[loss=0.1667, simple_loss=0.2536, pruned_loss=0.03992, over 12104.00 frames. ], tot_loss[loss=0.1654, simple_loss=0.2558, pruned_loss=0.03749, over 1526658.11 frames. ], batch size: 33, lr: 3.19e-03, grad_scale: 16.0 2023-05-18 22:53:30,531 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=320416.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 22:53:33,243 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=320419.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:53:43,605 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.1891, 4.9019, 5.0864, 5.0492, 4.9204, 5.1131, 5.0517, 3.2949], device='cuda:0'), covar=tensor([0.0109, 0.0067, 0.0077, 0.0066, 0.0049, 0.0125, 0.0085, 0.0647], device='cuda:0'), in_proj_covar=tensor([0.0071, 0.0081, 0.0086, 0.0075, 0.0062, 0.0096, 0.0083, 0.0101], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 22:53:48,649 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=2.66 vs. limit=5.0 2023-05-18 22:54:04,943 INFO [finetune.py:992] (0/2) Epoch 19, batch 250, loss[loss=0.1704, simple_loss=0.2651, pruned_loss=0.03791, over 12125.00 frames. ], tot_loss[loss=0.1655, simple_loss=0.2562, pruned_loss=0.03737, over 1705478.99 frames. ], batch size: 38, lr: 3.19e-03, grad_scale: 16.0 2023-05-18 22:54:06,927 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=320467.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:54:30,025 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=320500.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:54:31,282 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.798e+02 2.586e+02 2.848e+02 3.553e+02 7.290e+02, threshold=5.696e+02, percent-clipped=1.0 2023-05-18 22:54:39,770 INFO [finetune.py:992] (0/2) Epoch 19, batch 300, loss[loss=0.1746, simple_loss=0.2657, pruned_loss=0.04174, over 12253.00 frames. ], tot_loss[loss=0.164, simple_loss=0.2545, pruned_loss=0.03673, over 1858364.20 frames. ], batch size: 37, lr: 3.19e-03, grad_scale: 8.0 2023-05-18 22:54:43,236 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=320519.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:54:57,892 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0446, 4.9593, 4.9033, 4.9347, 4.6259, 5.0345, 4.9803, 5.2805], device='cuda:0'), covar=tensor([0.0319, 0.0192, 0.0184, 0.0424, 0.0802, 0.0346, 0.0193, 0.0205], device='cuda:0'), in_proj_covar=tensor([0.0195, 0.0197, 0.0188, 0.0240, 0.0236, 0.0217, 0.0175, 0.0230], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:0') 2023-05-18 22:55:01,550 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.49 vs. limit=2.0 2023-05-18 22:55:03,336 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=320548.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:55:14,332 INFO [finetune.py:992] (0/2) Epoch 19, batch 350, loss[loss=0.1574, simple_loss=0.2561, pruned_loss=0.02936, over 12371.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.2536, pruned_loss=0.03628, over 1977625.52 frames. ], batch size: 35, lr: 3.19e-03, grad_scale: 8.0 2023-05-18 22:55:16,576 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.5557, 2.5022, 3.7018, 4.5269, 3.8103, 4.5330, 3.8696, 3.2892], device='cuda:0'), covar=tensor([0.0036, 0.0438, 0.0121, 0.0054, 0.0136, 0.0075, 0.0145, 0.0353], device='cuda:0'), in_proj_covar=tensor([0.0089, 0.0121, 0.0102, 0.0080, 0.0104, 0.0116, 0.0101, 0.0137], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-18 22:55:22,980 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.19 vs. limit=2.0 2023-05-18 22:55:42,321 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.731e+02 2.625e+02 3.106e+02 3.632e+02 9.872e+02, threshold=6.212e+02, percent-clipped=3.0 2023-05-18 22:55:50,480 INFO [finetune.py:992] (0/2) Epoch 19, batch 400, loss[loss=0.174, simple_loss=0.259, pruned_loss=0.04448, over 12107.00 frames. ], tot_loss[loss=0.1637, simple_loss=0.2541, pruned_loss=0.03662, over 2059003.68 frames. ], batch size: 32, lr: 3.19e-03, grad_scale: 8.0 2023-05-18 22:55:52,585 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=320617.0, num_to_drop=1, layers_to_drop={3} 2023-05-18 22:56:19,772 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.7093, 2.8137, 4.5826, 4.7688, 2.9218, 2.5228, 2.7635, 2.1354], device='cuda:0'), covar=tensor([0.1810, 0.3174, 0.0498, 0.0424, 0.1365, 0.2730, 0.3313, 0.4355], device='cuda:0'), in_proj_covar=tensor([0.0309, 0.0393, 0.0280, 0.0305, 0.0278, 0.0324, 0.0408, 0.0384], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-18 22:56:25,037 INFO [finetune.py:992] (0/2) Epoch 19, batch 450, loss[loss=0.1539, simple_loss=0.239, pruned_loss=0.03437, over 11800.00 frames. ], tot_loss[loss=0.1637, simple_loss=0.254, pruned_loss=0.03668, over 2139180.43 frames. ], batch size: 26, lr: 3.19e-03, grad_scale: 8.0 2023-05-18 22:56:26,593 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=320666.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:56:51,272 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.037e+02 2.664e+02 3.120e+02 3.783e+02 1.584e+03, threshold=6.241e+02, percent-clipped=3.0 2023-05-18 22:56:59,592 INFO [finetune.py:992] (0/2) Epoch 19, batch 500, loss[loss=0.1462, simple_loss=0.23, pruned_loss=0.03124, over 11996.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.2527, pruned_loss=0.03642, over 2193672.10 frames. ], batch size: 28, lr: 3.19e-03, grad_scale: 8.0 2023-05-18 22:57:01,095 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=320716.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:57:09,315 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=320727.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:57:35,298 INFO [finetune.py:992] (0/2) Epoch 19, batch 550, loss[loss=0.1685, simple_loss=0.2586, pruned_loss=0.03925, over 12240.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2529, pruned_loss=0.03652, over 2231379.79 frames. ], batch size: 32, lr: 3.19e-03, grad_scale: 8.0 2023-05-18 22:57:35,357 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=320764.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:58:00,599 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.9163, 2.2875, 3.4545, 2.9848, 3.4353, 2.9414, 2.3705, 3.4765], device='cuda:0'), covar=tensor([0.0226, 0.0548, 0.0283, 0.0333, 0.0233, 0.0306, 0.0547, 0.0193], device='cuda:0'), in_proj_covar=tensor([0.0185, 0.0209, 0.0196, 0.0191, 0.0223, 0.0172, 0.0202, 0.0197], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 22:58:01,762 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.916e+02 2.446e+02 2.837e+02 3.398e+02 2.517e+03, threshold=5.673e+02, percent-clipped=2.0 2023-05-18 22:58:10,064 INFO [finetune.py:992] (0/2) Epoch 19, batch 600, loss[loss=0.2082, simple_loss=0.3049, pruned_loss=0.05578, over 12146.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2533, pruned_loss=0.0364, over 2257953.21 frames. ], batch size: 39, lr: 3.19e-03, grad_scale: 8.0 2023-05-18 22:58:13,647 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=320819.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:58:45,546 INFO [finetune.py:992] (0/2) Epoch 19, batch 650, loss[loss=0.2004, simple_loss=0.2823, pruned_loss=0.05923, over 7948.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2531, pruned_loss=0.03637, over 2271851.11 frames. ], batch size: 99, lr: 3.19e-03, grad_scale: 8.0 2023-05-18 22:58:45,768 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.4549, 4.8495, 3.2737, 2.7593, 4.3286, 2.9064, 4.0770, 3.4648], device='cuda:0'), covar=tensor([0.0785, 0.0552, 0.1018, 0.1655, 0.0267, 0.1251, 0.0550, 0.0752], device='cuda:0'), in_proj_covar=tensor([0.0188, 0.0257, 0.0177, 0.0202, 0.0142, 0.0186, 0.0199, 0.0176], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 22:58:48,289 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=320867.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:58:53,290 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.5093, 4.0557, 3.8697, 4.3179, 3.2032, 3.9402, 2.7314, 4.2441], device='cuda:0'), covar=tensor([0.1327, 0.0698, 0.1311, 0.0944, 0.1045, 0.0601, 0.1759, 0.1059], device='cuda:0'), in_proj_covar=tensor([0.0231, 0.0270, 0.0299, 0.0358, 0.0246, 0.0245, 0.0265, 0.0369], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-18 22:58:58,893 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.1493, 4.7081, 4.7305, 4.9327, 4.7677, 4.9133, 4.8532, 2.4677], device='cuda:0'), covar=tensor([0.0089, 0.0071, 0.0118, 0.0069, 0.0059, 0.0117, 0.0091, 0.0990], device='cuda:0'), in_proj_covar=tensor([0.0072, 0.0082, 0.0087, 0.0076, 0.0063, 0.0097, 0.0084, 0.0103], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 22:59:12,575 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.739e+02 2.680e+02 3.217e+02 3.943e+02 5.915e+02, threshold=6.434e+02, percent-clipped=1.0 2023-05-18 22:59:15,667 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.2775, 3.1751, 4.6427, 2.5010, 2.5875, 3.5343, 2.8397, 3.6584], device='cuda:0'), covar=tensor([0.0476, 0.1305, 0.0364, 0.1280, 0.2124, 0.1502, 0.1577, 0.1213], device='cuda:0'), in_proj_covar=tensor([0.0237, 0.0239, 0.0259, 0.0186, 0.0239, 0.0292, 0.0226, 0.0267], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 22:59:20,815 INFO [finetune.py:992] (0/2) Epoch 19, batch 700, loss[loss=0.1767, simple_loss=0.2666, pruned_loss=0.04337, over 12167.00 frames. ], tot_loss[loss=0.1633, simple_loss=0.2536, pruned_loss=0.03653, over 2295879.00 frames. ], batch size: 36, lr: 3.19e-03, grad_scale: 8.0 2023-05-18 22:59:23,083 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=320917.0, num_to_drop=1, layers_to_drop={2} 2023-05-18 22:59:55,583 INFO [finetune.py:992] (0/2) Epoch 19, batch 750, loss[loss=0.1292, simple_loss=0.2162, pruned_loss=0.02112, over 12194.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2527, pruned_loss=0.03616, over 2314064.50 frames. ], batch size: 29, lr: 3.19e-03, grad_scale: 8.0 2023-05-18 22:59:56,312 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=320965.0, num_to_drop=1, layers_to_drop={0} 2023-05-18 23:00:21,786 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.600e+02 2.715e+02 3.173e+02 3.808e+02 6.144e+02, threshold=6.346e+02, percent-clipped=0.0 2023-05-18 23:00:30,882 INFO [finetune.py:992] (0/2) Epoch 19, batch 800, loss[loss=0.1468, simple_loss=0.2316, pruned_loss=0.03099, over 12028.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2529, pruned_loss=0.03645, over 2326648.80 frames. ], batch size: 31, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:00:36,495 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=321022.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:01:05,664 INFO [finetune.py:992] (0/2) Epoch 19, batch 850, loss[loss=0.1475, simple_loss=0.2355, pruned_loss=0.02979, over 12015.00 frames. ], tot_loss[loss=0.1639, simple_loss=0.2541, pruned_loss=0.03686, over 2335701.20 frames. ], batch size: 31, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:01:23,645 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.7757, 3.5168, 3.5634, 3.7331, 3.6925, 3.7699, 3.6872, 2.5680], device='cuda:0'), covar=tensor([0.0106, 0.0131, 0.0168, 0.0098, 0.0080, 0.0128, 0.0106, 0.0858], device='cuda:0'), in_proj_covar=tensor([0.0072, 0.0082, 0.0087, 0.0076, 0.0063, 0.0097, 0.0084, 0.0102], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 23:01:31,913 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.562e+02 2.652e+02 3.102e+02 3.680e+02 5.824e+02, threshold=6.205e+02, percent-clipped=0.0 2023-05-18 23:01:40,219 INFO [finetune.py:992] (0/2) Epoch 19, batch 900, loss[loss=0.1618, simple_loss=0.2541, pruned_loss=0.03472, over 12357.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2533, pruned_loss=0.0364, over 2342198.42 frames. ], batch size: 35, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:01:44,129 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.16 vs. limit=2.0 2023-05-18 23:01:48,202 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=321125.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:01:48,801 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=321126.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:02:13,096 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=321160.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:02:15,744 INFO [finetune.py:992] (0/2) Epoch 19, batch 950, loss[loss=0.1394, simple_loss=0.225, pruned_loss=0.0269, over 12188.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.2518, pruned_loss=0.03592, over 2357553.57 frames. ], batch size: 29, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:02:31,616 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=321186.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:02:32,322 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=321187.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:02:42,843 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.921e+02 2.656e+02 3.176e+02 3.656e+02 5.814e+02, threshold=6.353e+02, percent-clipped=0.0 2023-05-18 23:02:51,122 INFO [finetune.py:992] (0/2) Epoch 19, batch 1000, loss[loss=0.1732, simple_loss=0.2666, pruned_loss=0.03992, over 12355.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.2518, pruned_loss=0.03594, over 2364717.44 frames. ], batch size: 35, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:02:56,229 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=321221.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:03:05,796 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.2947, 2.9314, 2.7652, 2.7559, 2.5719, 2.4521, 2.7053, 2.0617], device='cuda:0'), covar=tensor([0.0405, 0.0213, 0.0232, 0.0255, 0.0370, 0.0393, 0.0230, 0.0508], device='cuda:0'), in_proj_covar=tensor([0.0199, 0.0169, 0.0172, 0.0198, 0.0207, 0.0207, 0.0181, 0.0212], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 23:03:08,451 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.9801, 5.8642, 5.4492, 5.3589, 5.9341, 5.1585, 5.3165, 5.4434], device='cuda:0'), covar=tensor([0.1632, 0.0910, 0.1090, 0.1896, 0.0938, 0.2367, 0.2196, 0.1202], device='cuda:0'), in_proj_covar=tensor([0.0365, 0.0516, 0.0416, 0.0456, 0.0471, 0.0451, 0.0406, 0.0395], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 23:03:25,694 INFO [finetune.py:992] (0/2) Epoch 19, batch 1050, loss[loss=0.1665, simple_loss=0.2604, pruned_loss=0.03631, over 11619.00 frames. ], tot_loss[loss=0.162, simple_loss=0.252, pruned_loss=0.03603, over 2367170.26 frames. ], batch size: 48, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:03:30,978 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0 2023-05-18 23:03:47,402 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0265, 5.9656, 5.5978, 5.5641, 6.0686, 5.2470, 5.5926, 5.5911], device='cuda:0'), covar=tensor([0.1671, 0.0851, 0.1084, 0.1703, 0.0830, 0.2295, 0.1645, 0.1250], device='cuda:0'), in_proj_covar=tensor([0.0367, 0.0517, 0.0416, 0.0456, 0.0472, 0.0452, 0.0406, 0.0396], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 23:03:52,218 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.627e+02 2.582e+02 2.917e+02 3.353e+02 7.382e+02, threshold=5.833e+02, percent-clipped=1.0 2023-05-18 23:04:01,504 INFO [finetune.py:992] (0/2) Epoch 19, batch 1100, loss[loss=0.1527, simple_loss=0.2408, pruned_loss=0.03232, over 12345.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.2516, pruned_loss=0.03582, over 2367124.43 frames. ], batch size: 35, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:04:07,822 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=321322.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:04:09,928 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=321325.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:04:24,609 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.49 vs. limit=2.0 2023-05-18 23:04:36,704 INFO [finetune.py:992] (0/2) Epoch 19, batch 1150, loss[loss=0.1412, simple_loss=0.2256, pruned_loss=0.02842, over 12273.00 frames. ], tot_loss[loss=0.162, simple_loss=0.2518, pruned_loss=0.0361, over 2366730.83 frames. ], batch size: 28, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:04:40,982 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=321370.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:04:51,976 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=321386.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:05:03,333 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.095e+02 2.618e+02 3.087e+02 3.775e+02 7.185e+02, threshold=6.175e+02, percent-clipped=1.0 2023-05-18 23:05:11,865 INFO [finetune.py:992] (0/2) Epoch 19, batch 1200, loss[loss=0.182, simple_loss=0.2752, pruned_loss=0.04436, over 11181.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2523, pruned_loss=0.03598, over 2376114.68 frames. ], batch size: 55, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:05:16,204 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.91 vs. limit=2.0 2023-05-18 23:05:18,695 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=321424.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:05:30,860 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0763, 4.9626, 4.8240, 4.9112, 4.5925, 5.0235, 5.0447, 5.2779], device='cuda:0'), covar=tensor([0.0267, 0.0181, 0.0241, 0.0412, 0.0852, 0.0405, 0.0204, 0.0196], device='cuda:0'), in_proj_covar=tensor([0.0204, 0.0206, 0.0198, 0.0254, 0.0247, 0.0228, 0.0184, 0.0240], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-18 23:05:46,539 INFO [finetune.py:992] (0/2) Epoch 19, batch 1250, loss[loss=0.1604, simple_loss=0.2606, pruned_loss=0.03013, over 12273.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2523, pruned_loss=0.03601, over 2385211.61 frames. ], batch size: 37, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:05:53,648 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=321473.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:05:59,156 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=321481.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:05:59,845 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=321482.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:06:01,971 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=321485.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:06:13,536 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.547e+02 2.888e+02 3.407e+02 5.181e+02, threshold=5.777e+02, percent-clipped=0.0 2023-05-18 23:06:21,786 INFO [finetune.py:992] (0/2) Epoch 19, batch 1300, loss[loss=0.1773, simple_loss=0.2709, pruned_loss=0.04187, over 10629.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2527, pruned_loss=0.03609, over 2389482.43 frames. ], batch size: 69, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:06:23,334 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=321516.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:06:35,898 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=321534.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:06:56,530 INFO [finetune.py:992] (0/2) Epoch 19, batch 1350, loss[loss=0.1677, simple_loss=0.2563, pruned_loss=0.03954, over 12365.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2529, pruned_loss=0.03626, over 2384836.25 frames. ], batch size: 35, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:07:23,242 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.881e+02 2.752e+02 3.072e+02 3.598e+02 5.405e+02, threshold=6.145e+02, percent-clipped=0.0 2023-05-18 23:07:32,275 INFO [finetune.py:992] (0/2) Epoch 19, batch 1400, loss[loss=0.1646, simple_loss=0.2385, pruned_loss=0.04538, over 11846.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2521, pruned_loss=0.03615, over 2388288.51 frames. ], batch size: 26, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:08:08,029 INFO [finetune.py:992] (0/2) Epoch 19, batch 1450, loss[loss=0.2118, simple_loss=0.2877, pruned_loss=0.06796, over 7359.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.2518, pruned_loss=0.03596, over 2375940.85 frames. ], batch size: 98, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:08:19,623 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=321681.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:08:34,287 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.750e+02 2.607e+02 3.008e+02 3.618e+02 6.815e+02, threshold=6.016e+02, percent-clipped=1.0 2023-05-18 23:08:42,650 INFO [finetune.py:992] (0/2) Epoch 19, batch 1500, loss[loss=0.1745, simple_loss=0.2687, pruned_loss=0.04013, over 10385.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2531, pruned_loss=0.03648, over 2378267.34 frames. ], batch size: 68, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:09:18,020 INFO [finetune.py:992] (0/2) Epoch 19, batch 1550, loss[loss=0.1667, simple_loss=0.2589, pruned_loss=0.03722, over 12109.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2522, pruned_loss=0.03605, over 2386159.35 frames. ], batch size: 39, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:09:25,199 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.6099, 4.4573, 4.4505, 4.5267, 4.1895, 4.6500, 4.6062, 4.7862], device='cuda:0'), covar=tensor([0.0264, 0.0195, 0.0226, 0.0377, 0.0763, 0.0448, 0.0204, 0.0210], device='cuda:0'), in_proj_covar=tensor([0.0208, 0.0209, 0.0201, 0.0258, 0.0250, 0.0231, 0.0186, 0.0244], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-18 23:09:29,282 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=321780.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:09:29,658 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.99 vs. limit=5.0 2023-05-18 23:09:29,996 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=321781.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:09:31,351 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=321782.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:09:45,113 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.788e+02 2.545e+02 2.978e+02 3.691e+02 1.085e+03, threshold=5.957e+02, percent-clipped=1.0 2023-05-18 23:09:53,447 INFO [finetune.py:992] (0/2) Epoch 19, batch 1600, loss[loss=0.1701, simple_loss=0.267, pruned_loss=0.03662, over 11620.00 frames. ], tot_loss[loss=0.162, simple_loss=0.2522, pruned_loss=0.03594, over 2383520.21 frames. ], batch size: 48, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:09:54,970 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=321816.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:10:04,344 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=321829.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:10:04,358 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=321829.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:10:05,061 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=321830.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:10:08,854 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.92 vs. limit=5.0 2023-05-18 23:10:28,091 INFO [finetune.py:992] (0/2) Epoch 19, batch 1650, loss[loss=0.1532, simple_loss=0.2486, pruned_loss=0.02887, over 12115.00 frames. ], tot_loss[loss=0.1633, simple_loss=0.2535, pruned_loss=0.03662, over 2360468.16 frames. ], batch size: 33, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:10:28,150 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=321864.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:10:54,130 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.819e+02 2.848e+02 3.238e+02 3.715e+02 9.115e+02, threshold=6.477e+02, percent-clipped=3.0 2023-05-18 23:10:59,949 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.3003, 2.6599, 3.1260, 4.1200, 2.2004, 4.1448, 4.2941, 4.3310], device='cuda:0'), covar=tensor([0.0206, 0.1340, 0.0623, 0.0203, 0.1676, 0.0362, 0.0196, 0.0129], device='cuda:0'), in_proj_covar=tensor([0.0127, 0.0207, 0.0187, 0.0125, 0.0192, 0.0184, 0.0182, 0.0129], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 23:11:03,308 INFO [finetune.py:992] (0/2) Epoch 19, batch 1700, loss[loss=0.1735, simple_loss=0.268, pruned_loss=0.03954, over 12006.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.2537, pruned_loss=0.03654, over 2370134.39 frames. ], batch size: 40, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:11:19,645 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=321936.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:11:38,948 INFO [finetune.py:992] (0/2) Epoch 19, batch 1750, loss[loss=0.1585, simple_loss=0.2499, pruned_loss=0.03356, over 11677.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2532, pruned_loss=0.03605, over 2366379.01 frames. ], batch size: 48, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:11:51,020 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=321981.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:11:52,450 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.7004, 4.3772, 4.6350, 4.1804, 4.4259, 4.2259, 4.6774, 4.2560], device='cuda:0'), covar=tensor([0.0301, 0.0393, 0.0339, 0.0268, 0.0357, 0.0311, 0.0232, 0.0780], device='cuda:0'), in_proj_covar=tensor([0.0281, 0.0281, 0.0307, 0.0278, 0.0276, 0.0276, 0.0251, 0.0226], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 23:11:53,185 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.4645, 4.9042, 3.1904, 2.9992, 4.2714, 2.9901, 4.1041, 3.5816], device='cuda:0'), covar=tensor([0.0699, 0.0480, 0.1042, 0.1331, 0.0301, 0.1191, 0.0499, 0.0746], device='cuda:0'), in_proj_covar=tensor([0.0190, 0.0260, 0.0178, 0.0203, 0.0144, 0.0186, 0.0201, 0.0177], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 23:12:00,852 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=2.59 vs. limit=5.0 2023-05-18 23:12:02,146 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=321997.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:12:04,291 INFO [checkpoint.py:75] (0/2) Saving checkpoint to pruned_transducer_stateless7/exp_giga_finetune/checkpoint-222000.pt 2023-05-18 23:12:08,158 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.996e+02 2.590e+02 2.927e+02 3.533e+02 7.060e+02, threshold=5.854e+02, percent-clipped=1.0 2023-05-18 23:12:16,649 INFO [finetune.py:992] (0/2) Epoch 19, batch 1800, loss[loss=0.16, simple_loss=0.2586, pruned_loss=0.03073, over 12360.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2527, pruned_loss=0.03585, over 2373485.95 frames. ], batch size: 38, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:12:27,059 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=322029.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:12:52,434 INFO [finetune.py:992] (0/2) Epoch 19, batch 1850, loss[loss=0.1671, simple_loss=0.2547, pruned_loss=0.03979, over 12057.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2528, pruned_loss=0.03595, over 2374507.22 frames. ], batch size: 42, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:12:58,199 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.9232, 4.8140, 4.6980, 4.8330, 4.3977, 4.9462, 4.9134, 5.1051], device='cuda:0'), covar=tensor([0.0232, 0.0177, 0.0210, 0.0363, 0.0853, 0.0369, 0.0162, 0.0185], device='cuda:0'), in_proj_covar=tensor([0.0206, 0.0208, 0.0200, 0.0257, 0.0249, 0.0230, 0.0185, 0.0242], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-18 23:13:02,390 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.4080, 4.0807, 4.1549, 4.2593, 4.1765, 4.3771, 4.2788, 2.5226], device='cuda:0'), covar=tensor([0.0106, 0.0090, 0.0121, 0.0082, 0.0069, 0.0107, 0.0081, 0.0904], device='cuda:0'), in_proj_covar=tensor([0.0073, 0.0083, 0.0088, 0.0077, 0.0063, 0.0098, 0.0085, 0.0102], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 23:13:03,744 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=322080.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:13:18,671 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.638e+02 2.489e+02 2.964e+02 3.499e+02 6.491e+02, threshold=5.927e+02, percent-clipped=1.0 2023-05-18 23:13:26,990 INFO [finetune.py:992] (0/2) Epoch 19, batch 1900, loss[loss=0.1777, simple_loss=0.2814, pruned_loss=0.03701, over 12285.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2532, pruned_loss=0.03642, over 2372072.79 frames. ], batch size: 37, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:13:33,266 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.0215, 2.2877, 3.3965, 3.9871, 3.5770, 4.0015, 3.5012, 2.6059], device='cuda:0'), covar=tensor([0.0051, 0.0462, 0.0176, 0.0060, 0.0159, 0.0086, 0.0168, 0.0470], device='cuda:0'), in_proj_covar=tensor([0.0091, 0.0123, 0.0105, 0.0082, 0.0106, 0.0118, 0.0104, 0.0140], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 23:13:36,581 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=322128.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:13:37,334 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=322129.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:14:01,584 INFO [finetune.py:992] (0/2) Epoch 19, batch 1950, loss[loss=0.1552, simple_loss=0.2545, pruned_loss=0.028, over 12287.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.2537, pruned_loss=0.03637, over 2372973.38 frames. ], batch size: 33, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:14:03,866 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.3350, 5.1539, 5.2653, 5.3340, 4.9380, 4.9908, 4.6536, 5.2661], device='cuda:0'), covar=tensor([0.0839, 0.0732, 0.1009, 0.0708, 0.1995, 0.1544, 0.0710, 0.1118], device='cuda:0'), in_proj_covar=tensor([0.0567, 0.0738, 0.0644, 0.0652, 0.0880, 0.0771, 0.0589, 0.0510], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0003, 0.0003], device='cuda:0') 2023-05-18 23:14:10,538 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=322177.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:14:28,143 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.612e+02 3.062e+02 3.546e+02 5.710e+02, threshold=6.124e+02, percent-clipped=0.0 2023-05-18 23:14:37,911 INFO [finetune.py:992] (0/2) Epoch 19, batch 2000, loss[loss=0.1834, simple_loss=0.2692, pruned_loss=0.04877, over 12052.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2537, pruned_loss=0.0361, over 2376355.31 frames. ], batch size: 37, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:15:13,350 INFO [finetune.py:992] (0/2) Epoch 19, batch 2050, loss[loss=0.1707, simple_loss=0.2638, pruned_loss=0.03882, over 11845.00 frames. ], tot_loss[loss=0.162, simple_loss=0.2524, pruned_loss=0.0358, over 2374178.25 frames. ], batch size: 44, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:15:32,914 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=322292.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:15:39,796 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.505e+02 2.981e+02 3.557e+02 8.250e+02, threshold=5.962e+02, percent-clipped=1.0 2023-05-18 23:15:48,269 INFO [finetune.py:992] (0/2) Epoch 19, batch 2100, loss[loss=0.1473, simple_loss=0.2356, pruned_loss=0.02957, over 12289.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2529, pruned_loss=0.03576, over 2373276.73 frames. ], batch size: 33, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:15:52,671 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0488, 4.9348, 4.8362, 4.9860, 3.9599, 5.1506, 5.0654, 5.1214], device='cuda:0'), covar=tensor([0.0240, 0.0183, 0.0202, 0.0342, 0.1268, 0.0321, 0.0191, 0.0239], device='cuda:0'), in_proj_covar=tensor([0.0207, 0.0209, 0.0201, 0.0258, 0.0250, 0.0231, 0.0187, 0.0243], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-18 23:16:24,387 INFO [finetune.py:992] (0/2) Epoch 19, batch 2150, loss[loss=0.1378, simple_loss=0.2332, pruned_loss=0.02116, over 12083.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.2513, pruned_loss=0.03547, over 2374382.07 frames. ], batch size: 32, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:16:30,837 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.9808, 2.2740, 3.5949, 3.0251, 3.5066, 3.1258, 2.4359, 3.5320], device='cuda:0'), covar=tensor([0.0190, 0.0567, 0.0207, 0.0289, 0.0183, 0.0244, 0.0518, 0.0192], device='cuda:0'), in_proj_covar=tensor([0.0189, 0.0213, 0.0200, 0.0193, 0.0227, 0.0175, 0.0205, 0.0201], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 23:16:50,911 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.665e+02 2.497e+02 2.936e+02 3.608e+02 4.796e+02, threshold=5.872e+02, percent-clipped=0.0 2023-05-18 23:16:54,077 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.18 vs. limit=2.0 2023-05-18 23:16:59,365 INFO [finetune.py:992] (0/2) Epoch 19, batch 2200, loss[loss=0.1479, simple_loss=0.2419, pruned_loss=0.02694, over 11657.00 frames. ], tot_loss[loss=0.1609, simple_loss=0.251, pruned_loss=0.03537, over 2375804.96 frames. ], batch size: 48, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:17:03,933 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.94 vs. limit=5.0 2023-05-18 23:17:33,625 INFO [finetune.py:992] (0/2) Epoch 19, batch 2250, loss[loss=0.1649, simple_loss=0.2497, pruned_loss=0.04007, over 12277.00 frames. ], tot_loss[loss=0.1609, simple_loss=0.2511, pruned_loss=0.03536, over 2377087.56 frames. ], batch size: 28, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:18:00,766 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.708e+02 2.587e+02 3.015e+02 3.733e+02 1.019e+03, threshold=6.030e+02, percent-clipped=3.0 2023-05-18 23:18:09,541 INFO [finetune.py:992] (0/2) Epoch 19, batch 2300, loss[loss=0.1855, simple_loss=0.277, pruned_loss=0.04707, over 12289.00 frames. ], tot_loss[loss=0.1609, simple_loss=0.2509, pruned_loss=0.03547, over 2379774.16 frames. ], batch size: 37, lr: 3.18e-03, grad_scale: 16.0 2023-05-18 23:18:10,508 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.8078, 3.5409, 5.1236, 2.6137, 2.8841, 3.7891, 3.3036, 3.8046], device='cuda:0'), covar=tensor([0.0448, 0.1153, 0.0337, 0.1301, 0.2042, 0.1669, 0.1362, 0.1298], device='cuda:0'), in_proj_covar=tensor([0.0245, 0.0245, 0.0270, 0.0192, 0.0244, 0.0303, 0.0232, 0.0278], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 23:18:44,133 INFO [finetune.py:992] (0/2) Epoch 19, batch 2350, loss[loss=0.1517, simple_loss=0.2479, pruned_loss=0.02779, over 10624.00 frames. ], tot_loss[loss=0.1615, simple_loss=0.2517, pruned_loss=0.03566, over 2372027.96 frames. ], batch size: 68, lr: 3.18e-03, grad_scale: 16.0 2023-05-18 23:19:03,596 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=322592.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:19:07,232 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.2391, 4.8808, 5.2297, 4.5956, 4.9417, 4.6903, 5.2669, 4.8678], device='cuda:0'), covar=tensor([0.0341, 0.0405, 0.0311, 0.0275, 0.0396, 0.0314, 0.0225, 0.0393], device='cuda:0'), in_proj_covar=tensor([0.0282, 0.0283, 0.0306, 0.0279, 0.0278, 0.0277, 0.0253, 0.0227], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 23:19:08,864 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.21 vs. limit=2.0 2023-05-18 23:19:10,913 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.735e+02 3.133e+02 3.848e+02 5.750e+02, threshold=6.265e+02, percent-clipped=0.0 2023-05-18 23:19:16,137 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.28 vs. limit=2.0 2023-05-18 23:19:19,404 INFO [finetune.py:992] (0/2) Epoch 19, batch 2400, loss[loss=0.1504, simple_loss=0.2494, pruned_loss=0.02572, over 12342.00 frames. ], tot_loss[loss=0.1613, simple_loss=0.2512, pruned_loss=0.03566, over 2369943.60 frames. ], batch size: 36, lr: 3.18e-03, grad_scale: 16.0 2023-05-18 23:19:35,600 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.66 vs. limit=2.0 2023-05-18 23:19:37,416 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=322640.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:19:55,252 INFO [finetune.py:992] (0/2) Epoch 19, batch 2450, loss[loss=0.1575, simple_loss=0.2473, pruned_loss=0.03388, over 12157.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.2513, pruned_loss=0.03591, over 2365877.65 frames. ], batch size: 34, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:20:21,560 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.795e+02 2.551e+02 2.922e+02 3.455e+02 4.706e+02, threshold=5.844e+02, percent-clipped=0.0 2023-05-18 23:20:29,732 INFO [finetune.py:992] (0/2) Epoch 19, batch 2500, loss[loss=0.148, simple_loss=0.2305, pruned_loss=0.03276, over 12119.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2516, pruned_loss=0.0363, over 2367024.70 frames. ], batch size: 30, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:20:32,035 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.7582, 2.4524, 3.1778, 2.7741, 3.1053, 2.9654, 2.4371, 3.1984], device='cuda:0'), covar=tensor([0.0164, 0.0348, 0.0199, 0.0261, 0.0189, 0.0191, 0.0361, 0.0164], device='cuda:0'), in_proj_covar=tensor([0.0191, 0.0215, 0.0201, 0.0196, 0.0229, 0.0176, 0.0207, 0.0202], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 23:20:38,832 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([6.2180, 6.1593, 5.9819, 5.4458, 5.3741, 6.1059, 5.7490, 5.5220], device='cuda:0'), covar=tensor([0.0658, 0.0867, 0.0675, 0.1783, 0.0670, 0.0712, 0.1659, 0.1096], device='cuda:0'), in_proj_covar=tensor([0.0655, 0.0591, 0.0539, 0.0667, 0.0441, 0.0762, 0.0814, 0.0590], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0004, 0.0002], device='cuda:0') 2023-05-18 23:21:01,858 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.2791, 4.9007, 5.1469, 5.0858, 5.0844, 5.1183, 5.0325, 2.7449], device='cuda:0'), covar=tensor([0.0097, 0.0063, 0.0069, 0.0052, 0.0040, 0.0089, 0.0065, 0.0817], device='cuda:0'), in_proj_covar=tensor([0.0074, 0.0084, 0.0089, 0.0078, 0.0064, 0.0099, 0.0086, 0.0104], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 23:21:04,478 INFO [finetune.py:992] (0/2) Epoch 19, batch 2550, loss[loss=0.1653, simple_loss=0.2548, pruned_loss=0.03796, over 11623.00 frames. ], tot_loss[loss=0.1615, simple_loss=0.2512, pruned_loss=0.0359, over 2372200.90 frames. ], batch size: 48, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:21:09,645 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.6194, 2.5751, 3.8286, 4.6544, 3.9272, 4.6037, 3.8225, 3.3536], device='cuda:0'), covar=tensor([0.0037, 0.0408, 0.0123, 0.0042, 0.0111, 0.0082, 0.0165, 0.0341], device='cuda:0'), in_proj_covar=tensor([0.0091, 0.0123, 0.0104, 0.0081, 0.0104, 0.0117, 0.0103, 0.0138], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 23:21:31,740 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.889e+02 2.580e+02 3.047e+02 3.667e+02 9.272e+02, threshold=6.094e+02, percent-clipped=2.0 2023-05-18 23:21:36,003 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.2558, 2.7758, 3.7302, 3.2127, 3.6159, 3.3287, 2.9391, 3.6816], device='cuda:0'), covar=tensor([0.0162, 0.0401, 0.0172, 0.0282, 0.0194, 0.0209, 0.0361, 0.0158], device='cuda:0'), in_proj_covar=tensor([0.0191, 0.0216, 0.0202, 0.0197, 0.0230, 0.0177, 0.0207, 0.0204], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 23:21:40,566 INFO [finetune.py:992] (0/2) Epoch 19, batch 2600, loss[loss=0.1368, simple_loss=0.2238, pruned_loss=0.02491, over 12352.00 frames. ], tot_loss[loss=0.1608, simple_loss=0.2503, pruned_loss=0.03562, over 2376313.98 frames. ], batch size: 30, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:21:50,487 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.7251, 3.5849, 5.2126, 2.8156, 2.8125, 3.8927, 3.1644, 3.8058], device='cuda:0'), covar=tensor([0.0541, 0.1080, 0.0443, 0.1175, 0.1956, 0.1695, 0.1506, 0.1274], device='cuda:0'), in_proj_covar=tensor([0.0245, 0.0245, 0.0269, 0.0191, 0.0244, 0.0303, 0.0232, 0.0277], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 23:22:14,662 INFO [finetune.py:992] (0/2) Epoch 19, batch 2650, loss[loss=0.1535, simple_loss=0.2416, pruned_loss=0.03271, over 12254.00 frames. ], tot_loss[loss=0.1603, simple_loss=0.2504, pruned_loss=0.03514, over 2381993.80 frames. ], batch size: 32, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:22:39,482 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=3.20 vs. limit=5.0 2023-05-18 23:22:41,151 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.673e+02 2.569e+02 3.082e+02 3.739e+02 6.003e+02, threshold=6.165e+02, percent-clipped=1.0 2023-05-18 23:22:45,541 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.9283, 3.6260, 5.2774, 2.8101, 2.9908, 3.9083, 3.3427, 3.8483], device='cuda:0'), covar=tensor([0.0365, 0.1046, 0.0294, 0.1199, 0.1940, 0.1569, 0.1303, 0.1264], device='cuda:0'), in_proj_covar=tensor([0.0245, 0.0245, 0.0268, 0.0191, 0.0244, 0.0303, 0.0232, 0.0277], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 23:22:49,452 INFO [finetune.py:992] (0/2) Epoch 19, batch 2700, loss[loss=0.1623, simple_loss=0.2517, pruned_loss=0.03642, over 12365.00 frames. ], tot_loss[loss=0.1604, simple_loss=0.2506, pruned_loss=0.03509, over 2382464.67 frames. ], batch size: 36, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:23:25,450 INFO [finetune.py:992] (0/2) Epoch 19, batch 2750, loss[loss=0.1706, simple_loss=0.2642, pruned_loss=0.03851, over 12367.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.2514, pruned_loss=0.03544, over 2376660.74 frames. ], batch size: 35, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:23:41,449 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=322987.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:23:51,991 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.607e+02 2.529e+02 2.986e+02 3.645e+02 7.663e+02, threshold=5.971e+02, percent-clipped=1.0 2023-05-18 23:24:00,412 INFO [finetune.py:992] (0/2) Epoch 19, batch 2800, loss[loss=0.1527, simple_loss=0.2377, pruned_loss=0.03388, over 11775.00 frames. ], tot_loss[loss=0.1606, simple_loss=0.2509, pruned_loss=0.03516, over 2374807.11 frames. ], batch size: 26, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:24:24,018 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=323048.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:24:35,129 INFO [finetune.py:992] (0/2) Epoch 19, batch 2850, loss[loss=0.158, simple_loss=0.2337, pruned_loss=0.04119, over 12283.00 frames. ], tot_loss[loss=0.1602, simple_loss=0.2501, pruned_loss=0.03511, over 2378147.83 frames. ], batch size: 28, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:24:52,743 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.5188, 2.4989, 3.1712, 4.3027, 2.3138, 4.3837, 4.5007, 4.5376], device='cuda:0'), covar=tensor([0.0139, 0.1310, 0.0544, 0.0162, 0.1439, 0.0269, 0.0149, 0.0102], device='cuda:0'), in_proj_covar=tensor([0.0126, 0.0207, 0.0187, 0.0125, 0.0193, 0.0185, 0.0184, 0.0130], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 23:24:56,896 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.2380, 3.9094, 3.9558, 4.3184, 3.0473, 3.7575, 2.6555, 3.9572], device='cuda:0'), covar=tensor([0.1665, 0.0725, 0.0977, 0.0595, 0.1130, 0.0692, 0.1898, 0.0997], device='cuda:0'), in_proj_covar=tensor([0.0233, 0.0270, 0.0302, 0.0363, 0.0246, 0.0245, 0.0262, 0.0370], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-18 23:25:02,863 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.854e+02 2.503e+02 2.936e+02 3.359e+02 7.469e+02, threshold=5.872e+02, percent-clipped=3.0 2023-05-18 23:25:11,042 INFO [finetune.py:992] (0/2) Epoch 19, batch 2900, loss[loss=0.1755, simple_loss=0.2793, pruned_loss=0.03581, over 12187.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.2513, pruned_loss=0.03555, over 2370972.38 frames. ], batch size: 35, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:25:46,005 INFO [finetune.py:992] (0/2) Epoch 19, batch 2950, loss[loss=0.1745, simple_loss=0.2692, pruned_loss=0.03989, over 12281.00 frames. ], tot_loss[loss=0.161, simple_loss=0.2509, pruned_loss=0.03556, over 2381930.93 frames. ], batch size: 34, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:26:12,553 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.639e+02 2.607e+02 2.944e+02 3.596e+02 8.072e+02, threshold=5.887e+02, percent-clipped=1.0 2023-05-18 23:26:13,464 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.0494, 2.5681, 3.6150, 2.9895, 3.4308, 3.1204, 2.5136, 3.5356], device='cuda:0'), covar=tensor([0.0178, 0.0376, 0.0220, 0.0302, 0.0202, 0.0253, 0.0421, 0.0161], device='cuda:0'), in_proj_covar=tensor([0.0190, 0.0215, 0.0201, 0.0195, 0.0229, 0.0177, 0.0207, 0.0203], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 23:26:20,548 INFO [finetune.py:992] (0/2) Epoch 19, batch 3000, loss[loss=0.1763, simple_loss=0.2692, pruned_loss=0.0417, over 12056.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.2509, pruned_loss=0.03569, over 2388049.28 frames. ], batch size: 42, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:26:20,549 INFO [finetune.py:1017] (0/2) Computing validation loss 2023-05-18 23:26:30,421 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.9722, 4.9629, 4.8595, 4.8682, 4.2831, 5.0466, 4.9639, 5.1184], device='cuda:0'), covar=tensor([0.0301, 0.0156, 0.0194, 0.0333, 0.0964, 0.0307, 0.0172, 0.0180], device='cuda:0'), in_proj_covar=tensor([0.0208, 0.0209, 0.0201, 0.0260, 0.0250, 0.0232, 0.0188, 0.0244], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-18 23:26:38,294 INFO [finetune.py:1026] (0/2) Epoch 19, validation: loss=0.3167, simple_loss=0.3909, pruned_loss=0.1212, over 1020973.00 frames. 2023-05-18 23:26:38,295 INFO [finetune.py:1027] (0/2) Maximum memory allocated so far is 12525MB 2023-05-18 23:27:12,133 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.7019, 2.7344, 4.4418, 4.5861, 2.6453, 2.5756, 2.9266, 2.1737], device='cuda:0'), covar=tensor([0.1822, 0.3121, 0.0487, 0.0477, 0.1579, 0.2786, 0.3050, 0.4302], device='cuda:0'), in_proj_covar=tensor([0.0313, 0.0397, 0.0284, 0.0311, 0.0282, 0.0327, 0.0410, 0.0386], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-18 23:27:12,526 INFO [finetune.py:992] (0/2) Epoch 19, batch 3050, loss[loss=0.161, simple_loss=0.2519, pruned_loss=0.035, over 12007.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.2513, pruned_loss=0.03599, over 2377375.36 frames. ], batch size: 40, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:27:38,841 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.734e+02 2.566e+02 3.043e+02 3.630e+02 7.706e+02, threshold=6.086e+02, percent-clipped=3.0 2023-05-18 23:27:47,224 INFO [finetune.py:992] (0/2) Epoch 19, batch 3100, loss[loss=0.1506, simple_loss=0.2348, pruned_loss=0.03319, over 10284.00 frames. ], tot_loss[loss=0.1606, simple_loss=0.2502, pruned_loss=0.03552, over 2369309.95 frames. ], batch size: 68, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:27:50,264 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.6640, 3.3396, 5.1843, 2.5406, 2.8167, 3.7638, 3.1519, 3.7706], device='cuda:0'), covar=tensor([0.0518, 0.1219, 0.0351, 0.1305, 0.1992, 0.1700, 0.1498, 0.1271], device='cuda:0'), in_proj_covar=tensor([0.0245, 0.0245, 0.0271, 0.0191, 0.0245, 0.0304, 0.0233, 0.0279], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 23:27:55,023 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.7407, 3.4394, 5.2410, 2.6433, 2.8901, 3.9078, 3.1402, 3.8649], device='cuda:0'), covar=tensor([0.0451, 0.1101, 0.0262, 0.1270, 0.1937, 0.1463, 0.1480, 0.1227], device='cuda:0'), in_proj_covar=tensor([0.0245, 0.0245, 0.0271, 0.0191, 0.0245, 0.0304, 0.0232, 0.0279], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 23:28:07,421 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=323343.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:28:23,316 INFO [finetune.py:992] (0/2) Epoch 19, batch 3150, loss[loss=0.1459, simple_loss=0.2389, pruned_loss=0.02647, over 12415.00 frames. ], tot_loss[loss=0.1606, simple_loss=0.2506, pruned_loss=0.03535, over 2370032.07 frames. ], batch size: 32, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:28:25,038 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.24 vs. limit=2.0 2023-05-18 23:28:27,004 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=323369.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:28:49,926 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.619e+02 2.536e+02 2.903e+02 3.548e+02 1.078e+03, threshold=5.805e+02, percent-clipped=2.0 2023-05-18 23:28:58,411 INFO [finetune.py:992] (0/2) Epoch 19, batch 3200, loss[loss=0.1321, simple_loss=0.215, pruned_loss=0.0246, over 12188.00 frames. ], tot_loss[loss=0.1602, simple_loss=0.2501, pruned_loss=0.03517, over 2371888.70 frames. ], batch size: 29, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:29:09,750 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=323430.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 23:29:28,516 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=323457.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:29:33,190 INFO [finetune.py:992] (0/2) Epoch 19, batch 3250, loss[loss=0.1772, simple_loss=0.2737, pruned_loss=0.04036, over 11565.00 frames. ], tot_loss[loss=0.1608, simple_loss=0.2509, pruned_loss=0.03538, over 2377637.84 frames. ], batch size: 48, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:29:54,528 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.99 vs. limit=2.0 2023-05-18 23:29:57,716 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.2399, 6.2345, 5.8205, 5.8262, 6.2870, 5.5742, 5.7545, 5.7658], device='cuda:0'), covar=tensor([0.1519, 0.0861, 0.1211, 0.1735, 0.0944, 0.2125, 0.1875, 0.1072], device='cuda:0'), in_proj_covar=tensor([0.0372, 0.0524, 0.0421, 0.0465, 0.0482, 0.0466, 0.0417, 0.0403], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 23:30:00,424 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.701e+02 2.588e+02 2.965e+02 3.406e+02 6.970e+02, threshold=5.929e+02, percent-clipped=2.0 2023-05-18 23:30:09,257 INFO [finetune.py:992] (0/2) Epoch 19, batch 3300, loss[loss=0.1964, simple_loss=0.2799, pruned_loss=0.05642, over 12035.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.251, pruned_loss=0.03568, over 2366204.79 frames. ], batch size: 40, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:30:11,008 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.60 vs. limit=2.0 2023-05-18 23:30:12,214 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=323518.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:30:28,178 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.1995, 4.7303, 4.0624, 4.8990, 4.4181, 2.4231, 3.9202, 2.9895], device='cuda:0'), covar=tensor([0.0814, 0.0591, 0.1414, 0.0527, 0.1168, 0.2094, 0.1383, 0.3260], device='cuda:0'), in_proj_covar=tensor([0.0311, 0.0382, 0.0366, 0.0341, 0.0377, 0.0280, 0.0351, 0.0369], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 23:30:43,806 INFO [finetune.py:992] (0/2) Epoch 19, batch 3350, loss[loss=0.1491, simple_loss=0.2438, pruned_loss=0.02724, over 12117.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.2509, pruned_loss=0.03563, over 2373464.91 frames. ], batch size: 30, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:30:59,764 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.6185, 2.7316, 3.7745, 4.6454, 3.9681, 4.6121, 3.8276, 3.3715], device='cuda:0'), covar=tensor([0.0041, 0.0391, 0.0137, 0.0046, 0.0108, 0.0072, 0.0168, 0.0361], device='cuda:0'), in_proj_covar=tensor([0.0092, 0.0124, 0.0106, 0.0082, 0.0106, 0.0119, 0.0105, 0.0141], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 23:31:10,324 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.920e+02 2.670e+02 3.086e+02 3.768e+02 7.521e+02, threshold=6.171e+02, percent-clipped=2.0 2023-05-18 23:31:18,785 INFO [finetune.py:992] (0/2) Epoch 19, batch 3400, loss[loss=0.2163, simple_loss=0.2918, pruned_loss=0.0704, over 8077.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.2509, pruned_loss=0.0356, over 2372373.77 frames. ], batch size: 97, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:31:39,218 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=323643.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:31:54,289 INFO [finetune.py:992] (0/2) Epoch 19, batch 3450, loss[loss=0.164, simple_loss=0.2514, pruned_loss=0.03836, over 12125.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.2509, pruned_loss=0.03578, over 2377054.45 frames. ], batch size: 42, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:31:54,495 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=323664.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:32:01,284 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.2172, 6.2308, 5.7126, 5.8250, 6.2287, 5.4998, 5.7370, 5.7161], device='cuda:0'), covar=tensor([0.1455, 0.0819, 0.1095, 0.1540, 0.0956, 0.2043, 0.1778, 0.1022], device='cuda:0'), in_proj_covar=tensor([0.0369, 0.0518, 0.0416, 0.0459, 0.0479, 0.0460, 0.0413, 0.0399], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 23:32:01,474 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.9208, 3.5431, 5.3063, 2.7488, 3.0942, 3.7874, 3.3608, 3.8158], device='cuda:0'), covar=tensor([0.0469, 0.1131, 0.0372, 0.1297, 0.1890, 0.1840, 0.1358, 0.1333], device='cuda:0'), in_proj_covar=tensor([0.0246, 0.0247, 0.0273, 0.0193, 0.0247, 0.0306, 0.0234, 0.0281], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 23:32:12,910 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=323691.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:32:19,490 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.7359, 3.6418, 3.2234, 3.1556, 2.8261, 2.7870, 3.5638, 2.4288], device='cuda:0'), covar=tensor([0.0368, 0.0139, 0.0195, 0.0229, 0.0457, 0.0382, 0.0147, 0.0550], device='cuda:0'), in_proj_covar=tensor([0.0199, 0.0172, 0.0174, 0.0199, 0.0208, 0.0207, 0.0182, 0.0212], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 23:32:21,394 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.825e+02 2.730e+02 3.210e+02 3.631e+02 5.460e+02, threshold=6.419e+02, percent-clipped=0.0 2023-05-18 23:32:29,811 INFO [finetune.py:992] (0/2) Epoch 19, batch 3500, loss[loss=0.1564, simple_loss=0.249, pruned_loss=0.03188, over 12173.00 frames. ], tot_loss[loss=0.1607, simple_loss=0.2506, pruned_loss=0.03534, over 2381672.34 frames. ], batch size: 36, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:32:32,187 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.48 vs. limit=2.0 2023-05-18 23:32:36,727 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.4735, 5.3127, 5.4354, 5.4934, 4.9448, 4.9765, 4.9337, 5.3388], device='cuda:0'), covar=tensor([0.0968, 0.0874, 0.0985, 0.0781, 0.2861, 0.1845, 0.0696, 0.1416], device='cuda:0'), in_proj_covar=tensor([0.0571, 0.0742, 0.0653, 0.0657, 0.0891, 0.0774, 0.0590, 0.0512], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0004, 0.0003, 0.0003], device='cuda:0') 2023-05-18 23:32:37,332 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=323725.0, num_to_drop=1, layers_to_drop={0} 2023-05-18 23:32:37,417 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=323725.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:32:37,447 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=323725.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:32:51,674 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.82 vs. limit=2.0 2023-05-18 23:33:04,767 INFO [finetune.py:992] (0/2) Epoch 19, batch 3550, loss[loss=0.1505, simple_loss=0.2424, pruned_loss=0.0293, over 12185.00 frames. ], tot_loss[loss=0.1602, simple_loss=0.2499, pruned_loss=0.03526, over 2381526.17 frames. ], batch size: 31, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:33:09,029 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=323770.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:33:16,044 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.23 vs. limit=2.0 2023-05-18 23:33:20,739 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=323786.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:33:32,468 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.617e+02 3.192e+02 3.994e+02 7.138e+02, threshold=6.385e+02, percent-clipped=1.0 2023-05-18 23:33:40,292 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=323813.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:33:40,898 INFO [finetune.py:992] (0/2) Epoch 19, batch 3600, loss[loss=0.1711, simple_loss=0.2572, pruned_loss=0.04251, over 12038.00 frames. ], tot_loss[loss=0.161, simple_loss=0.2508, pruned_loss=0.03555, over 2372031.02 frames. ], batch size: 37, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:33:51,490 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=2.21 vs. limit=5.0 2023-05-18 23:33:52,767 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=323831.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:34:15,675 INFO [finetune.py:992] (0/2) Epoch 19, batch 3650, loss[loss=0.1351, simple_loss=0.2257, pruned_loss=0.02224, over 12091.00 frames. ], tot_loss[loss=0.161, simple_loss=0.2509, pruned_loss=0.03552, over 2369255.65 frames. ], batch size: 32, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:34:37,790 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.7091, 3.7770, 3.3828, 3.2756, 2.9975, 2.8341, 3.7878, 2.6121], device='cuda:0'), covar=tensor([0.0446, 0.0209, 0.0252, 0.0249, 0.0480, 0.0477, 0.0160, 0.0515], device='cuda:0'), in_proj_covar=tensor([0.0200, 0.0172, 0.0174, 0.0200, 0.0208, 0.0208, 0.0183, 0.0212], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 23:34:38,406 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.9849, 4.9097, 4.8118, 4.8999, 4.5150, 5.0385, 5.0472, 5.1944], device='cuda:0'), covar=tensor([0.0283, 0.0180, 0.0225, 0.0383, 0.0850, 0.0368, 0.0171, 0.0186], device='cuda:0'), in_proj_covar=tensor([0.0208, 0.0208, 0.0201, 0.0260, 0.0250, 0.0231, 0.0188, 0.0243], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-18 23:34:41,546 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.859e+02 2.739e+02 3.134e+02 3.664e+02 6.861e+02, threshold=6.267e+02, percent-clipped=1.0 2023-05-18 23:34:49,961 INFO [finetune.py:992] (0/2) Epoch 19, batch 3700, loss[loss=0.1584, simple_loss=0.2571, pruned_loss=0.02983, over 11208.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2519, pruned_loss=0.03622, over 2368172.55 frames. ], batch size: 55, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:34:57,784 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=2.80 vs. limit=5.0 2023-05-18 23:35:25,638 INFO [finetune.py:992] (0/2) Epoch 19, batch 3750, loss[loss=0.1789, simple_loss=0.2816, pruned_loss=0.03816, over 12063.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.2514, pruned_loss=0.03589, over 2367909.55 frames. ], batch size: 42, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:35:28,495 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.3682, 2.4632, 3.0324, 4.1119, 2.1406, 4.1619, 4.2713, 4.4020], device='cuda:0'), covar=tensor([0.0148, 0.1371, 0.0587, 0.0207, 0.1551, 0.0319, 0.0187, 0.0122], device='cuda:0'), in_proj_covar=tensor([0.0125, 0.0203, 0.0185, 0.0125, 0.0190, 0.0184, 0.0182, 0.0129], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 23:35:38,178 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.7202, 3.8571, 3.4097, 3.3059, 2.9995, 2.8948, 3.8267, 2.6348], device='cuda:0'), covar=tensor([0.0398, 0.0174, 0.0212, 0.0227, 0.0469, 0.0458, 0.0147, 0.0523], device='cuda:0'), in_proj_covar=tensor([0.0201, 0.0173, 0.0175, 0.0201, 0.0210, 0.0209, 0.0184, 0.0214], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 23:35:40,920 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=323986.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:35:50,775 INFO [checkpoint.py:75] (0/2) Saving checkpoint to pruned_transducer_stateless7/exp_giga_finetune/checkpoint-224000.pt 2023-05-18 23:35:55,031 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.754e+02 2.453e+02 2.878e+02 3.473e+02 6.728e+02, threshold=5.757e+02, percent-clipped=2.0 2023-05-18 23:36:03,533 INFO [finetune.py:992] (0/2) Epoch 19, batch 3800, loss[loss=0.1696, simple_loss=0.2648, pruned_loss=0.03716, over 12108.00 frames. ], tot_loss[loss=0.161, simple_loss=0.251, pruned_loss=0.03546, over 2376215.90 frames. ], batch size: 38, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:36:07,757 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=324020.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:36:11,389 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=324025.0, num_to_drop=1, layers_to_drop={0} 2023-05-18 23:36:26,583 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=324047.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:36:38,633 INFO [finetune.py:992] (0/2) Epoch 19, batch 3850, loss[loss=0.1627, simple_loss=0.2521, pruned_loss=0.03663, over 12180.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.2516, pruned_loss=0.03538, over 2379985.76 frames. ], batch size: 35, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:36:45,069 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=324073.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:36:50,706 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=324081.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:36:59,517 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-05-18 23:37:05,944 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.656e+02 2.522e+02 3.008e+02 3.489e+02 6.361e+02, threshold=6.016e+02, percent-clipped=2.0 2023-05-18 23:37:11,091 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([6.1199, 6.0920, 5.9299, 5.3418, 5.3071, 6.0086, 5.6220, 5.4682], device='cuda:0'), covar=tensor([0.0858, 0.1092, 0.0688, 0.2010, 0.0678, 0.0903, 0.1754, 0.1197], device='cuda:0'), in_proj_covar=tensor([0.0656, 0.0593, 0.0542, 0.0672, 0.0445, 0.0767, 0.0822, 0.0592], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0004, 0.0004, 0.0002], device='cuda:0') 2023-05-18 23:37:13,998 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=324113.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:37:14,563 INFO [finetune.py:992] (0/2) Epoch 19, batch 3900, loss[loss=0.1681, simple_loss=0.2606, pruned_loss=0.03777, over 12017.00 frames. ], tot_loss[loss=0.1608, simple_loss=0.2511, pruned_loss=0.03531, over 2373852.93 frames. ], batch size: 42, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:37:22,792 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=324126.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:37:47,387 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=324161.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:37:49,251 INFO [finetune.py:992] (0/2) Epoch 19, batch 3950, loss[loss=0.1912, simple_loss=0.2809, pruned_loss=0.0508, over 11992.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2525, pruned_loss=0.03593, over 2369961.74 frames. ], batch size: 42, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:37:56,618 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=3.01 vs. limit=5.0 2023-05-18 23:38:16,541 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.704e+02 3.151e+02 3.838e+02 9.620e+02, threshold=6.301e+02, percent-clipped=2.0 2023-05-18 23:38:21,124 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=3.09 vs. limit=5.0 2023-05-18 23:38:24,966 INFO [finetune.py:992] (0/2) Epoch 19, batch 4000, loss[loss=0.1372, simple_loss=0.2224, pruned_loss=0.02599, over 12189.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2523, pruned_loss=0.03619, over 2365375.46 frames. ], batch size: 29, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:38:33,774 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.65 vs. limit=5.0 2023-05-18 23:38:52,318 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0644, 4.9372, 4.8363, 4.9598, 4.2557, 5.0911, 5.0271, 5.1456], device='cuda:0'), covar=tensor([0.0287, 0.0193, 0.0230, 0.0387, 0.1125, 0.0460, 0.0209, 0.0236], device='cuda:0'), in_proj_covar=tensor([0.0211, 0.0211, 0.0203, 0.0262, 0.0253, 0.0235, 0.0190, 0.0247], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-18 23:38:55,835 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.2893, 4.6160, 2.9080, 2.6555, 3.9463, 2.5428, 3.8892, 3.1927], device='cuda:0'), covar=tensor([0.0771, 0.0556, 0.1182, 0.1543, 0.0300, 0.1459, 0.0483, 0.0847], device='cuda:0'), in_proj_covar=tensor([0.0192, 0.0265, 0.0180, 0.0206, 0.0146, 0.0189, 0.0203, 0.0179], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 23:38:59,868 INFO [finetune.py:992] (0/2) Epoch 19, batch 4050, loss[loss=0.1501, simple_loss=0.2334, pruned_loss=0.03338, over 11738.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2529, pruned_loss=0.03628, over 2356169.39 frames. ], batch size: 26, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:39:26,265 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.739e+02 3.058e+02 3.734e+02 6.379e+02, threshold=6.115e+02, percent-clipped=1.0 2023-05-18 23:39:34,539 INFO [finetune.py:992] (0/2) Epoch 19, batch 4100, loss[loss=0.1595, simple_loss=0.2496, pruned_loss=0.03469, over 11867.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.2513, pruned_loss=0.03542, over 2368574.53 frames. ], batch size: 44, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:39:36,036 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=324316.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 23:39:36,270 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=2.94 vs. limit=5.0 2023-05-18 23:39:38,631 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=324320.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:39:53,731 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=324342.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:40:09,359 INFO [finetune.py:992] (0/2) Epoch 19, batch 4150, loss[loss=0.149, simple_loss=0.2359, pruned_loss=0.03109, over 12197.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.2521, pruned_loss=0.03571, over 2364704.27 frames. ], batch size: 29, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:40:12,276 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=324368.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:40:18,485 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=324377.0, num_to_drop=1, layers_to_drop={3} 2023-05-18 23:40:21,807 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=324381.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:40:32,613 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.22 vs. limit=2.0 2023-05-18 23:40:35,951 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.743e+02 2.684e+02 3.086e+02 3.627e+02 7.219e+02, threshold=6.172e+02, percent-clipped=3.0 2023-05-18 23:40:44,279 INFO [finetune.py:992] (0/2) Epoch 19, batch 4200, loss[loss=0.161, simple_loss=0.2525, pruned_loss=0.03479, over 12050.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.2521, pruned_loss=0.0357, over 2372575.74 frames. ], batch size: 40, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:40:52,617 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=324426.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:40:54,098 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.5151, 2.5667, 3.1955, 4.3127, 2.2441, 4.3127, 4.4928, 4.5349], device='cuda:0'), covar=tensor([0.0166, 0.1250, 0.0546, 0.0188, 0.1449, 0.0258, 0.0157, 0.0107], device='cuda:0'), in_proj_covar=tensor([0.0126, 0.0207, 0.0186, 0.0126, 0.0193, 0.0186, 0.0185, 0.0130], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:0') 2023-05-18 23:40:54,653 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=324429.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:41:18,994 INFO [finetune.py:992] (0/2) Epoch 19, batch 4250, loss[loss=0.1531, simple_loss=0.245, pruned_loss=0.03056, over 12104.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.2519, pruned_loss=0.03567, over 2371759.58 frames. ], batch size: 32, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:41:25,176 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.81 vs. limit=2.0 2023-05-18 23:41:26,074 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=324474.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:41:29,736 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.1423, 4.9841, 4.9082, 4.9760, 4.6434, 5.0748, 5.1100, 5.2042], device='cuda:0'), covar=tensor([0.0195, 0.0154, 0.0175, 0.0302, 0.0704, 0.0269, 0.0131, 0.0187], device='cuda:0'), in_proj_covar=tensor([0.0209, 0.0210, 0.0201, 0.0260, 0.0251, 0.0233, 0.0189, 0.0245], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-18 23:41:45,975 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.545e+02 2.627e+02 3.023e+02 3.586e+02 8.523e+02, threshold=6.045e+02, percent-clipped=1.0 2023-05-18 23:41:54,191 INFO [finetune.py:992] (0/2) Epoch 19, batch 4300, loss[loss=0.1487, simple_loss=0.2363, pruned_loss=0.03054, over 12162.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.2515, pruned_loss=0.03534, over 2381013.51 frames. ], batch size: 34, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:42:29,914 INFO [finetune.py:992] (0/2) Epoch 19, batch 4350, loss[loss=0.1762, simple_loss=0.2781, pruned_loss=0.03719, over 12361.00 frames. ], tot_loss[loss=0.161, simple_loss=0.2514, pruned_loss=0.03535, over 2380721.51 frames. ], batch size: 36, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:42:36,449 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.39 vs. limit=2.0 2023-05-18 23:42:52,899 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.1479, 4.6776, 4.8275, 5.0119, 4.8082, 5.0027, 4.8609, 2.5707], device='cuda:0'), covar=tensor([0.0098, 0.0076, 0.0101, 0.0057, 0.0049, 0.0094, 0.0085, 0.0929], device='cuda:0'), in_proj_covar=tensor([0.0073, 0.0084, 0.0088, 0.0077, 0.0064, 0.0098, 0.0085, 0.0101], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 23:42:56,576 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.719e+02 2.719e+02 3.276e+02 3.849e+02 6.262e+02, threshold=6.552e+02, percent-clipped=3.0 2023-05-18 23:43:04,803 INFO [finetune.py:992] (0/2) Epoch 19, batch 4400, loss[loss=0.1565, simple_loss=0.2397, pruned_loss=0.03662, over 12344.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.252, pruned_loss=0.03571, over 2384716.02 frames. ], batch size: 31, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:43:24,875 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=324642.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:43:26,392 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.3977, 4.8353, 3.1681, 2.8662, 4.1050, 2.7894, 4.0528, 3.5322], device='cuda:0'), covar=tensor([0.0734, 0.0616, 0.1014, 0.1473, 0.0347, 0.1317, 0.0517, 0.0733], device='cuda:0'), in_proj_covar=tensor([0.0190, 0.0264, 0.0180, 0.0204, 0.0145, 0.0187, 0.0202, 0.0178], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 23:43:40,181 INFO [finetune.py:992] (0/2) Epoch 19, batch 4450, loss[loss=0.1547, simple_loss=0.2443, pruned_loss=0.03252, over 12175.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.2518, pruned_loss=0.03593, over 2378855.26 frames. ], batch size: 31, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:43:45,693 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=324672.0, num_to_drop=1, layers_to_drop={3} 2023-05-18 23:43:58,690 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=324690.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:44:00,967 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.2009, 4.5083, 4.0508, 4.8071, 4.5462, 2.8512, 3.9797, 2.9617], device='cuda:0'), covar=tensor([0.0845, 0.0843, 0.1456, 0.0641, 0.1095, 0.1879, 0.1318, 0.3675], device='cuda:0'), in_proj_covar=tensor([0.0318, 0.0387, 0.0372, 0.0348, 0.0382, 0.0285, 0.0356, 0.0376], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 23:44:06,791 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.953e+02 2.672e+02 3.182e+02 3.780e+02 7.429e+02, threshold=6.365e+02, percent-clipped=1.0 2023-05-18 23:44:15,087 INFO [finetune.py:992] (0/2) Epoch 19, batch 4500, loss[loss=0.1635, simple_loss=0.2544, pruned_loss=0.03631, over 12379.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2524, pruned_loss=0.03631, over 2370928.21 frames. ], batch size: 38, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:44:40,088 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.1924, 4.7409, 2.9383, 2.5793, 3.9345, 2.6412, 3.9225, 3.2822], device='cuda:0'), covar=tensor([0.0881, 0.0529, 0.1189, 0.1738, 0.0413, 0.1391, 0.0539, 0.0840], device='cuda:0'), in_proj_covar=tensor([0.0190, 0.0263, 0.0180, 0.0204, 0.0145, 0.0187, 0.0202, 0.0179], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 23:44:49,601 INFO [finetune.py:992] (0/2) Epoch 19, batch 4550, loss[loss=0.1451, simple_loss=0.2341, pruned_loss=0.02803, over 12175.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2519, pruned_loss=0.03622, over 2372058.87 frames. ], batch size: 29, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:44:56,211 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.1727, 3.7422, 3.9510, 4.1950, 2.9014, 3.6463, 2.6062, 3.9422], device='cuda:0'), covar=tensor([0.1645, 0.0827, 0.0881, 0.0661, 0.1206, 0.0715, 0.1841, 0.1121], device='cuda:0'), in_proj_covar=tensor([0.0233, 0.0273, 0.0303, 0.0365, 0.0249, 0.0248, 0.0264, 0.0374], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-18 23:45:11,951 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.3965, 4.9754, 5.3875, 4.7392, 5.0603, 4.8288, 5.4095, 4.9687], device='cuda:0'), covar=tensor([0.0284, 0.0419, 0.0297, 0.0257, 0.0450, 0.0325, 0.0204, 0.0397], device='cuda:0'), in_proj_covar=tensor([0.0284, 0.0285, 0.0311, 0.0281, 0.0282, 0.0281, 0.0257, 0.0229], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 23:45:17,026 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.768e+02 2.742e+02 3.078e+02 3.771e+02 5.857e+02, threshold=6.156e+02, percent-clipped=0.0 2023-05-18 23:45:22,237 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.8007, 3.5083, 3.5790, 3.7695, 3.7237, 3.8060, 3.6834, 2.5581], device='cuda:0'), covar=tensor([0.0108, 0.0138, 0.0159, 0.0080, 0.0078, 0.0132, 0.0101, 0.0780], device='cuda:0'), in_proj_covar=tensor([0.0073, 0.0084, 0.0088, 0.0077, 0.0064, 0.0099, 0.0086, 0.0102], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-18 23:45:25,540 INFO [finetune.py:992] (0/2) Epoch 19, batch 4600, loss[loss=0.1618, simple_loss=0.2485, pruned_loss=0.03755, over 11601.00 frames. ], tot_loss[loss=0.1615, simple_loss=0.2513, pruned_loss=0.03585, over 2374819.28 frames. ], batch size: 48, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:45:36,606 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-05-18 23:45:36,831 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([6.0349, 6.0113, 5.8036, 5.3833, 5.1609, 5.9383, 5.5694, 5.3479], device='cuda:0'), covar=tensor([0.0901, 0.1135, 0.0662, 0.1800, 0.0883, 0.0734, 0.1685, 0.1091], device='cuda:0'), in_proj_covar=tensor([0.0648, 0.0583, 0.0533, 0.0661, 0.0441, 0.0752, 0.0806, 0.0583], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0004, 0.0002], device='cuda:0') 2023-05-18 23:46:00,916 INFO [finetune.py:992] (0/2) Epoch 19, batch 4650, loss[loss=0.1354, simple_loss=0.2234, pruned_loss=0.02371, over 12366.00 frames. ], tot_loss[loss=0.1615, simple_loss=0.2513, pruned_loss=0.03583, over 2378445.23 frames. ], batch size: 30, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:46:27,194 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.031e+02 2.603e+02 3.036e+02 3.552e+02 5.344e+02, threshold=6.072e+02, percent-clipped=0.0 2023-05-18 23:46:35,304 INFO [finetune.py:992] (0/2) Epoch 19, batch 4700, loss[loss=0.1633, simple_loss=0.2583, pruned_loss=0.03419, over 12279.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.2517, pruned_loss=0.03591, over 2373735.66 frames. ], batch size: 37, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:46:50,771 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0816, 4.9369, 4.8859, 4.9504, 4.6018, 5.0821, 5.0375, 5.2432], device='cuda:0'), covar=tensor([0.0267, 0.0189, 0.0212, 0.0351, 0.0802, 0.0366, 0.0180, 0.0184], device='cuda:0'), in_proj_covar=tensor([0.0209, 0.0209, 0.0201, 0.0260, 0.0250, 0.0232, 0.0188, 0.0244], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-18 23:47:10,887 INFO [finetune.py:992] (0/2) Epoch 19, batch 4750, loss[loss=0.1774, simple_loss=0.269, pruned_loss=0.04286, over 11596.00 frames. ], tot_loss[loss=0.1615, simple_loss=0.2514, pruned_loss=0.03585, over 2381511.14 frames. ], batch size: 48, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:47:16,648 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=324972.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 23:47:38,231 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.723e+02 2.992e+02 3.345e+02 4.905e+02, threshold=5.983e+02, percent-clipped=0.0 2023-05-18 23:47:46,649 INFO [finetune.py:992] (0/2) Epoch 19, batch 4800, loss[loss=0.1402, simple_loss=0.2352, pruned_loss=0.02263, over 12247.00 frames. ], tot_loss[loss=0.1613, simple_loss=0.251, pruned_loss=0.03582, over 2378372.44 frames. ], batch size: 32, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:47:50,754 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=325020.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 23:47:56,175 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.8232, 5.7760, 5.7753, 5.0835, 4.9594, 5.8665, 4.9977, 5.2621], device='cuda:0'), covar=tensor([0.1177, 0.1541, 0.0968, 0.2801, 0.1330, 0.1152, 0.3052, 0.1995], device='cuda:0'), in_proj_covar=tensor([0.0647, 0.0579, 0.0534, 0.0658, 0.0437, 0.0751, 0.0803, 0.0581], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002], device='cuda:0') 2023-05-18 23:48:21,040 INFO [finetune.py:992] (0/2) Epoch 19, batch 4850, loss[loss=0.1608, simple_loss=0.2513, pruned_loss=0.03515, over 10556.00 frames. ], tot_loss[loss=0.1615, simple_loss=0.2512, pruned_loss=0.03596, over 2379522.35 frames. ], batch size: 69, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:48:22,307 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.84 vs. limit=5.0 2023-05-18 23:48:47,870 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.830e+02 2.618e+02 3.035e+02 3.810e+02 8.195e+02, threshold=6.070e+02, percent-clipped=4.0 2023-05-18 23:48:56,296 INFO [finetune.py:992] (0/2) Epoch 19, batch 4900, loss[loss=0.1841, simple_loss=0.2762, pruned_loss=0.04606, over 12349.00 frames. ], tot_loss[loss=0.1613, simple_loss=0.2509, pruned_loss=0.03587, over 2387642.51 frames. ], batch size: 36, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:49:05,185 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-05-18 23:49:31,705 INFO [finetune.py:992] (0/2) Epoch 19, batch 4950, loss[loss=0.1702, simple_loss=0.2624, pruned_loss=0.03904, over 12039.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.251, pruned_loss=0.03574, over 2385629.94 frames. ], batch size: 40, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:49:34,703 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.2270, 3.7048, 3.8599, 4.2214, 2.9407, 3.6628, 2.5746, 3.7921], device='cuda:0'), covar=tensor([0.1636, 0.0852, 0.0916, 0.0735, 0.1147, 0.0715, 0.1865, 0.0946], device='cuda:0'), in_proj_covar=tensor([0.0230, 0.0270, 0.0298, 0.0361, 0.0247, 0.0245, 0.0262, 0.0369], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-18 23:49:58,746 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.911e+02 2.620e+02 3.044e+02 3.772e+02 8.318e+02, threshold=6.087e+02, percent-clipped=2.0 2023-05-18 23:50:03,600 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.6616, 4.6360, 4.5155, 4.1283, 4.2503, 4.6050, 4.3372, 4.2146], device='cuda:0'), covar=tensor([0.0938, 0.1016, 0.0756, 0.1608, 0.1861, 0.0936, 0.1584, 0.1140], device='cuda:0'), in_proj_covar=tensor([0.0651, 0.0585, 0.0539, 0.0663, 0.0442, 0.0760, 0.0811, 0.0586], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0004, 0.0002], device='cuda:0') 2023-05-18 23:50:07,641 INFO [finetune.py:992] (0/2) Epoch 19, batch 5000, loss[loss=0.1433, simple_loss=0.2283, pruned_loss=0.02918, over 12131.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.251, pruned_loss=0.03591, over 2377220.85 frames. ], batch size: 30, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:50:08,493 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([6.0111, 5.9887, 5.7266, 5.2398, 5.1404, 5.8420, 5.4821, 5.2647], device='cuda:0'), covar=tensor([0.0823, 0.0989, 0.0744, 0.1866, 0.0779, 0.0837, 0.1661, 0.1135], device='cuda:0'), in_proj_covar=tensor([0.0651, 0.0585, 0.0539, 0.0663, 0.0442, 0.0760, 0.0811, 0.0587], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0004, 0.0002], device='cuda:0') 2023-05-18 23:50:22,474 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.1056, 4.9850, 4.8950, 4.9655, 4.6136, 5.1074, 5.0824, 5.3151], device='cuda:0'), covar=tensor([0.0195, 0.0170, 0.0211, 0.0384, 0.0732, 0.0279, 0.0144, 0.0171], device='cuda:0'), in_proj_covar=tensor([0.0207, 0.0209, 0.0201, 0.0258, 0.0249, 0.0230, 0.0187, 0.0243], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-18 23:50:43,199 INFO [finetune.py:992] (0/2) Epoch 19, batch 5050, loss[loss=0.1588, simple_loss=0.2533, pruned_loss=0.0322, over 11714.00 frames. ], tot_loss[loss=0.162, simple_loss=0.252, pruned_loss=0.03596, over 2371532.62 frames. ], batch size: 48, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:51:07,934 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=325300.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:51:09,725 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.855e+02 2.639e+02 3.116e+02 3.589e+02 7.174e+02, threshold=6.233e+02, percent-clipped=1.0 2023-05-18 23:51:12,708 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.5371, 2.6801, 3.3040, 4.3154, 2.4265, 4.2808, 4.4471, 4.5609], device='cuda:0'), covar=tensor([0.0133, 0.1322, 0.0492, 0.0182, 0.1433, 0.0309, 0.0184, 0.0104], device='cuda:0'), in_proj_covar=tensor([0.0125, 0.0206, 0.0187, 0.0126, 0.0190, 0.0185, 0.0185, 0.0130], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 23:51:17,452 INFO [finetune.py:992] (0/2) Epoch 19, batch 5100, loss[loss=0.1457, simple_loss=0.2302, pruned_loss=0.03063, over 11996.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.2517, pruned_loss=0.03592, over 2373661.83 frames. ], batch size: 28, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:51:26,065 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=2.67 vs. limit=5.0 2023-05-18 23:51:40,205 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.2275, 4.7736, 4.2388, 5.0828, 4.5632, 3.0821, 4.2206, 3.0565], device='cuda:0'), covar=tensor([0.0925, 0.0715, 0.1346, 0.0515, 0.1128, 0.1669, 0.1060, 0.3418], device='cuda:0'), in_proj_covar=tensor([0.0318, 0.0385, 0.0370, 0.0348, 0.0382, 0.0283, 0.0354, 0.0375], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 23:51:50,568 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=325361.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:51:52,457 INFO [finetune.py:992] (0/2) Epoch 19, batch 5150, loss[loss=0.131, simple_loss=0.2192, pruned_loss=0.02139, over 12192.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.2515, pruned_loss=0.03581, over 2370823.73 frames. ], batch size: 29, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:52:11,974 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.9103, 5.8585, 5.4501, 5.3416, 5.9156, 5.1144, 5.3166, 5.4343], device='cuda:0'), covar=tensor([0.1702, 0.0929, 0.1186, 0.1967, 0.0955, 0.2302, 0.2207, 0.1228], device='cuda:0'), in_proj_covar=tensor([0.0372, 0.0526, 0.0423, 0.0471, 0.0483, 0.0462, 0.0418, 0.0407], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 23:52:13,357 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=325394.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:52:19,649 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.675e+02 2.499e+02 2.990e+02 3.607e+02 7.752e+02, threshold=5.980e+02, percent-clipped=2.0 2023-05-18 23:52:28,034 INFO [finetune.py:992] (0/2) Epoch 19, batch 5200, loss[loss=0.1863, simple_loss=0.2743, pruned_loss=0.04908, over 8140.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.252, pruned_loss=0.03624, over 2351927.14 frames. ], batch size: 97, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:52:53,259 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.11 vs. limit=2.0 2023-05-18 23:52:56,452 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=325455.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:53:02,565 INFO [finetune.py:992] (0/2) Epoch 19, batch 5250, loss[loss=0.1434, simple_loss=0.2215, pruned_loss=0.03266, over 12270.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2527, pruned_loss=0.03661, over 2359106.52 frames. ], batch size: 28, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:53:30,093 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.785e+02 2.506e+02 2.926e+02 3.490e+02 1.307e+03, threshold=5.851e+02, percent-clipped=3.0 2023-05-18 23:53:33,778 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.6195, 3.6542, 3.2242, 3.1179, 2.6856, 2.6509, 3.5835, 2.3439], device='cuda:0'), covar=tensor([0.0387, 0.0160, 0.0194, 0.0235, 0.0440, 0.0417, 0.0135, 0.0534], device='cuda:0'), in_proj_covar=tensor([0.0200, 0.0172, 0.0174, 0.0201, 0.0208, 0.0208, 0.0182, 0.0213], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 23:53:38,284 INFO [finetune.py:992] (0/2) Epoch 19, batch 5300, loss[loss=0.1554, simple_loss=0.2401, pruned_loss=0.03532, over 12336.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.2517, pruned_loss=0.03588, over 2364148.02 frames. ], batch size: 30, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:53:42,851 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-05-18 23:54:05,596 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.91 vs. limit=2.0 2023-05-18 23:54:13,377 INFO [finetune.py:992] (0/2) Epoch 19, batch 5350, loss[loss=0.1552, simple_loss=0.2409, pruned_loss=0.03478, over 12118.00 frames. ], tot_loss[loss=0.1613, simple_loss=0.2511, pruned_loss=0.03575, over 2366752.03 frames. ], batch size: 33, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:54:15,670 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=325567.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:54:28,927 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.8430, 3.7366, 3.8802, 3.8745, 3.3252, 3.4460, 3.5471, 3.6854], device='cuda:0'), covar=tensor([0.1582, 0.1246, 0.1798, 0.1162, 0.3104, 0.2219, 0.0924, 0.1754], device='cuda:0'), in_proj_covar=tensor([0.0575, 0.0752, 0.0664, 0.0668, 0.0901, 0.0784, 0.0597, 0.0515], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0003, 0.0003], device='cuda:0') 2023-05-18 23:54:31,350 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.80 vs. limit=2.0 2023-05-18 23:54:40,800 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.587e+02 3.045e+02 3.701e+02 8.855e+02, threshold=6.090e+02, percent-clipped=2.0 2023-05-18 23:54:48,499 INFO [finetune.py:992] (0/2) Epoch 19, batch 5400, loss[loss=0.1473, simple_loss=0.235, pruned_loss=0.02976, over 12034.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.2516, pruned_loss=0.03592, over 2354838.06 frames. ], batch size: 31, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:54:58,166 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=325628.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:55:04,550 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.60 vs. limit=2.0 2023-05-18 23:55:17,945 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=325656.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:55:20,187 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=325659.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:55:23,563 INFO [finetune.py:992] (0/2) Epoch 19, batch 5450, loss[loss=0.1784, simple_loss=0.2666, pruned_loss=0.04512, over 11816.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2522, pruned_loss=0.03616, over 2353595.47 frames. ], batch size: 44, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:55:51,583 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.792e+02 2.666e+02 3.067e+02 3.561e+02 6.039e+02, threshold=6.133e+02, percent-clipped=0.0 2023-05-18 23:55:59,205 INFO [finetune.py:992] (0/2) Epoch 19, batch 5500, loss[loss=0.1428, simple_loss=0.2372, pruned_loss=0.02422, over 12357.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2523, pruned_loss=0.03601, over 2349954.58 frames. ], batch size: 35, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:56:03,527 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=325720.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:56:24,515 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=325750.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:56:34,085 INFO [finetune.py:992] (0/2) Epoch 19, batch 5550, loss[loss=0.161, simple_loss=0.2572, pruned_loss=0.03247, over 12157.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2525, pruned_loss=0.03613, over 2353034.97 frames. ], batch size: 34, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:57:02,002 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.736e+02 2.494e+02 2.975e+02 3.710e+02 8.699e+02, threshold=5.950e+02, percent-clipped=3.0 2023-05-18 23:57:09,542 INFO [finetune.py:992] (0/2) Epoch 19, batch 5600, loss[loss=0.2282, simple_loss=0.3018, pruned_loss=0.07726, over 8348.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.2516, pruned_loss=0.03613, over 2356088.33 frames. ], batch size: 99, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:57:44,519 INFO [finetune.py:992] (0/2) Epoch 19, batch 5650, loss[loss=0.1841, simple_loss=0.2702, pruned_loss=0.04901, over 11678.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.2517, pruned_loss=0.03584, over 2370986.48 frames. ], batch size: 48, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:58:05,756 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=325894.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:58:11,787 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.930e+02 2.504e+02 3.020e+02 3.545e+02 6.966e+02, threshold=6.040e+02, percent-clipped=2.0 2023-05-18 23:58:19,622 INFO [finetune.py:992] (0/2) Epoch 19, batch 5700, loss[loss=0.1745, simple_loss=0.2677, pruned_loss=0.04068, over 12108.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.2516, pruned_loss=0.03591, over 2375446.30 frames. ], batch size: 33, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:58:24,614 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.0325, 2.5720, 3.7054, 3.0556, 3.4490, 3.1895, 2.6436, 3.5735], device='cuda:0'), covar=tensor([0.0172, 0.0415, 0.0178, 0.0306, 0.0182, 0.0225, 0.0390, 0.0139], device='cuda:0'), in_proj_covar=tensor([0.0196, 0.0220, 0.0206, 0.0200, 0.0235, 0.0181, 0.0210, 0.0207], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-18 23:58:25,261 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.1165, 5.0052, 4.9245, 4.9646, 4.6429, 5.1661, 5.0719, 5.3027], device='cuda:0'), covar=tensor([0.0252, 0.0162, 0.0202, 0.0376, 0.0752, 0.0350, 0.0162, 0.0172], device='cuda:0'), in_proj_covar=tensor([0.0210, 0.0210, 0.0203, 0.0262, 0.0252, 0.0235, 0.0190, 0.0246], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-18 23:58:25,896 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=325923.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:58:49,116 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=325955.0, num_to_drop=1, layers_to_drop={2} 2023-05-18 23:58:49,714 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=325956.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:58:55,275 INFO [finetune.py:992] (0/2) Epoch 19, batch 5750, loss[loss=0.1691, simple_loss=0.2591, pruned_loss=0.03953, over 12046.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2525, pruned_loss=0.03634, over 2370109.12 frames. ], batch size: 37, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:59:21,063 INFO [checkpoint.py:75] (0/2) Saving checkpoint to pruned_transducer_stateless7/exp_giga_finetune/checkpoint-226000.pt 2023-05-18 23:59:25,912 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.978e+02 2.547e+02 2.942e+02 3.608e+02 6.203e+02, threshold=5.885e+02, percent-clipped=1.0 2023-05-18 23:59:26,136 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.2813, 4.8122, 2.9133, 2.7914, 4.1010, 2.7433, 3.9632, 3.3560], device='cuda:0'), covar=tensor([0.0807, 0.0423, 0.1275, 0.1508, 0.0293, 0.1345, 0.0552, 0.0848], device='cuda:0'), in_proj_covar=tensor([0.0193, 0.0267, 0.0183, 0.0207, 0.0147, 0.0190, 0.0206, 0.0180], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 23:59:26,682 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=326004.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:59:30,241 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.3031, 6.1855, 5.7151, 5.7019, 6.2088, 5.4458, 5.7092, 5.6609], device='cuda:0'), covar=tensor([0.1413, 0.0929, 0.1208, 0.1900, 0.0896, 0.2335, 0.2002, 0.1090], device='cuda:0'), in_proj_covar=tensor([0.0366, 0.0520, 0.0419, 0.0465, 0.0478, 0.0457, 0.0415, 0.0402], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 23:59:33,646 INFO [finetune.py:992] (0/2) Epoch 19, batch 5800, loss[loss=0.1932, simple_loss=0.2772, pruned_loss=0.05464, over 7811.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2518, pruned_loss=0.03626, over 2373045.81 frames. ], batch size: 98, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:59:34,422 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=326015.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:59:58,157 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.7380, 5.4644, 5.0517, 4.9813, 5.5463, 4.8274, 4.9745, 4.9952], device='cuda:0'), covar=tensor([0.1445, 0.1095, 0.1201, 0.2075, 0.0987, 0.2379, 0.2149, 0.1257], device='cuda:0'), in_proj_covar=tensor([0.0366, 0.0519, 0.0419, 0.0464, 0.0478, 0.0457, 0.0415, 0.0401], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-18 23:59:58,822 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=326050.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:00:08,294 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0 2023-05-19 00:00:08,474 INFO [finetune.py:992] (0/2) Epoch 19, batch 5850, loss[loss=0.1632, simple_loss=0.2556, pruned_loss=0.03542, over 12318.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2523, pruned_loss=0.03617, over 2372512.35 frames. ], batch size: 34, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:00:32,126 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=326098.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:00:35,478 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.771e+02 2.711e+02 3.114e+02 3.588e+02 1.446e+03, threshold=6.229e+02, percent-clipped=3.0 2023-05-19 00:00:43,240 INFO [finetune.py:992] (0/2) Epoch 19, batch 5900, loss[loss=0.2, simple_loss=0.2801, pruned_loss=0.05994, over 12053.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.2531, pruned_loss=0.03659, over 2367578.56 frames. ], batch size: 37, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:01:18,822 INFO [finetune.py:992] (0/2) Epoch 19, batch 5950, loss[loss=0.1899, simple_loss=0.2786, pruned_loss=0.05057, over 12020.00 frames. ], tot_loss[loss=0.162, simple_loss=0.2521, pruned_loss=0.03598, over 2375924.86 frames. ], batch size: 42, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:01:19,582 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.3019, 6.1537, 5.7320, 5.7213, 6.1728, 5.4659, 5.6155, 5.6758], device='cuda:0'), covar=tensor([0.1546, 0.0845, 0.1025, 0.1711, 0.0886, 0.2203, 0.1970, 0.1179], device='cuda:0'), in_proj_covar=tensor([0.0366, 0.0518, 0.0418, 0.0463, 0.0478, 0.0457, 0.0415, 0.0402], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 00:01:46,421 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.738e+02 2.527e+02 2.904e+02 3.247e+02 5.520e+02, threshold=5.807e+02, percent-clipped=0.0 2023-05-19 00:01:53,986 INFO [finetune.py:992] (0/2) Epoch 19, batch 6000, loss[loss=0.1652, simple_loss=0.2508, pruned_loss=0.03978, over 12353.00 frames. ], tot_loss[loss=0.162, simple_loss=0.252, pruned_loss=0.03603, over 2368600.48 frames. ], batch size: 35, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:01:53,986 INFO [finetune.py:1017] (0/2) Computing validation loss 2023-05-19 00:02:03,071 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.6292, 2.8094, 3.9910, 3.3232, 3.7846, 3.5469, 2.8743, 3.8850], device='cuda:0'), covar=tensor([0.0096, 0.0378, 0.0075, 0.0174, 0.0125, 0.0149, 0.0340, 0.0093], device='cuda:0'), in_proj_covar=tensor([0.0196, 0.0221, 0.0207, 0.0202, 0.0237, 0.0182, 0.0212, 0.0207], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 00:02:11,813 INFO [finetune.py:1026] (0/2) Epoch 19, validation: loss=0.3044, simple_loss=0.3846, pruned_loss=0.1121, over 1020973.00 frames. 2023-05-19 00:02:11,813 INFO [finetune.py:1027] (0/2) Maximum memory allocated so far is 12525MB 2023-05-19 00:02:18,302 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=326223.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:02:21,116 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=326227.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:02:37,418 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=326250.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 00:02:47,159 INFO [finetune.py:992] (0/2) Epoch 19, batch 6050, loss[loss=0.1527, simple_loss=0.2517, pruned_loss=0.02679, over 12356.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.2523, pruned_loss=0.03581, over 2373902.21 frames. ], batch size: 35, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:02:52,189 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=326271.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:03:04,185 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=326288.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:03:09,517 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.9981, 4.0081, 4.0168, 4.1073, 3.8698, 3.9313, 3.7648, 4.0016], device='cuda:0'), covar=tensor([0.1453, 0.0770, 0.1458, 0.0767, 0.1938, 0.1286, 0.0698, 0.1137], device='cuda:0'), in_proj_covar=tensor([0.0578, 0.0755, 0.0662, 0.0674, 0.0905, 0.0791, 0.0600, 0.0516], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:0') 2023-05-19 00:03:14,239 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.994e+02 2.713e+02 3.347e+02 3.935e+02 9.908e+02, threshold=6.694e+02, percent-clipped=3.0 2023-05-19 00:03:22,139 INFO [finetune.py:992] (0/2) Epoch 19, batch 6100, loss[loss=0.154, simple_loss=0.2452, pruned_loss=0.03136, over 12196.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2526, pruned_loss=0.03599, over 2372078.42 frames. ], batch size: 31, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:03:22,953 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=326315.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:03:56,981 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=326363.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:03:57,613 INFO [finetune.py:992] (0/2) Epoch 19, batch 6150, loss[loss=0.1644, simple_loss=0.258, pruned_loss=0.03544, over 12166.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2528, pruned_loss=0.03603, over 2373940.67 frames. ], batch size: 36, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:04:13,805 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.2063, 6.1027, 5.6640, 5.5763, 6.1689, 5.4808, 5.6115, 5.6201], device='cuda:0'), covar=tensor([0.1486, 0.0929, 0.1003, 0.2002, 0.0888, 0.2103, 0.1809, 0.1179], device='cuda:0'), in_proj_covar=tensor([0.0369, 0.0521, 0.0420, 0.0465, 0.0480, 0.0460, 0.0417, 0.0403], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 00:04:25,539 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.795e+02 2.626e+02 3.138e+02 3.752e+02 1.193e+03, threshold=6.276e+02, percent-clipped=1.0 2023-05-19 00:04:33,141 INFO [finetune.py:992] (0/2) Epoch 19, batch 6200, loss[loss=0.1352, simple_loss=0.216, pruned_loss=0.02723, over 11851.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2526, pruned_loss=0.03639, over 2370563.77 frames. ], batch size: 26, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:05:00,106 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0668, 6.0297, 5.6207, 5.4986, 6.1332, 5.3977, 5.6147, 5.4925], device='cuda:0'), covar=tensor([0.1612, 0.0970, 0.1299, 0.1826, 0.0917, 0.2327, 0.1858, 0.1294], device='cuda:0'), in_proj_covar=tensor([0.0369, 0.0523, 0.0421, 0.0466, 0.0481, 0.0462, 0.0418, 0.0404], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 00:05:07,524 INFO [finetune.py:992] (0/2) Epoch 19, batch 6250, loss[loss=0.164, simple_loss=0.242, pruned_loss=0.04296, over 12172.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.2527, pruned_loss=0.03644, over 2363428.07 frames. ], batch size: 31, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:05:34,824 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.776e+02 2.675e+02 3.318e+02 4.019e+02 6.717e+02, threshold=6.635e+02, percent-clipped=1.0 2023-05-19 00:05:37,292 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.16 vs. limit=2.0 2023-05-19 00:05:42,296 INFO [finetune.py:992] (0/2) Epoch 19, batch 6300, loss[loss=0.1412, simple_loss=0.2338, pruned_loss=0.02428, over 12357.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2531, pruned_loss=0.03643, over 2361202.73 frames. ], batch size: 30, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:06:07,813 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=326550.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:06:07,902 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.4955, 2.6580, 3.6245, 4.4221, 3.6648, 4.3971, 3.8176, 3.1002], device='cuda:0'), covar=tensor([0.0042, 0.0391, 0.0158, 0.0054, 0.0136, 0.0078, 0.0147, 0.0423], device='cuda:0'), in_proj_covar=tensor([0.0091, 0.0124, 0.0106, 0.0083, 0.0105, 0.0119, 0.0104, 0.0139], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 00:06:17,561 INFO [finetune.py:992] (0/2) Epoch 19, batch 6350, loss[loss=0.1841, simple_loss=0.278, pruned_loss=0.04509, over 12045.00 frames. ], tot_loss[loss=0.1637, simple_loss=0.2536, pruned_loss=0.03692, over 2360339.20 frames. ], batch size: 42, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:06:30,726 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=326583.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:06:37,216 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.15 vs. limit=2.0 2023-05-19 00:06:40,968 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=326598.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:06:42,104 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2023-05-19 00:06:44,717 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.703e+02 2.505e+02 2.938e+02 3.589e+02 9.004e+02, threshold=5.876e+02, percent-clipped=2.0 2023-05-19 00:06:52,562 INFO [finetune.py:992] (0/2) Epoch 19, batch 6400, loss[loss=0.1433, simple_loss=0.225, pruned_loss=0.0308, over 12349.00 frames. ], tot_loss[loss=0.1637, simple_loss=0.2536, pruned_loss=0.03688, over 2355729.03 frames. ], batch size: 30, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:06:54,065 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.1931, 5.9961, 5.5693, 5.5668, 6.1074, 5.3472, 5.4931, 5.5884], device='cuda:0'), covar=tensor([0.1449, 0.0955, 0.1044, 0.1734, 0.0843, 0.2303, 0.1998, 0.1048], device='cuda:0'), in_proj_covar=tensor([0.0369, 0.0521, 0.0420, 0.0464, 0.0482, 0.0463, 0.0416, 0.0403], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 00:07:28,628 INFO [finetune.py:992] (0/2) Epoch 19, batch 6450, loss[loss=0.1831, simple_loss=0.2809, pruned_loss=0.0426, over 11599.00 frames. ], tot_loss[loss=0.1639, simple_loss=0.2538, pruned_loss=0.03694, over 2352821.58 frames. ], batch size: 48, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:07:49,173 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.27 vs. limit=2.0 2023-05-19 00:07:55,425 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.654e+02 2.775e+02 3.187e+02 3.833e+02 9.993e+02, threshold=6.373e+02, percent-clipped=7.0 2023-05-19 00:08:03,161 INFO [finetune.py:992] (0/2) Epoch 19, batch 6500, loss[loss=0.1833, simple_loss=0.2706, pruned_loss=0.04802, over 12106.00 frames. ], tot_loss[loss=0.1637, simple_loss=0.2536, pruned_loss=0.03686, over 2362582.66 frames. ], batch size: 45, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:08:37,505 INFO [finetune.py:992] (0/2) Epoch 19, batch 6550, loss[loss=0.1394, simple_loss=0.2221, pruned_loss=0.02831, over 12286.00 frames. ], tot_loss[loss=0.1637, simple_loss=0.2536, pruned_loss=0.03693, over 2361872.48 frames. ], batch size: 28, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:08:47,987 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.0622, 2.5667, 3.6250, 3.0289, 3.4419, 3.1315, 2.5628, 3.5155], device='cuda:0'), covar=tensor([0.0207, 0.0442, 0.0201, 0.0287, 0.0215, 0.0256, 0.0429, 0.0173], device='cuda:0'), in_proj_covar=tensor([0.0195, 0.0220, 0.0207, 0.0201, 0.0235, 0.0182, 0.0211, 0.0207], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 00:08:49,634 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.82 vs. limit=2.0 2023-05-19 00:08:53,594 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2023-05-19 00:08:56,857 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=2.43 vs. limit=5.0 2023-05-19 00:09:05,149 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.832e+02 2.594e+02 3.171e+02 3.875e+02 1.476e+03, threshold=6.343e+02, percent-clipped=4.0 2023-05-19 00:09:13,172 INFO [finetune.py:992] (0/2) Epoch 19, batch 6600, loss[loss=0.144, simple_loss=0.2397, pruned_loss=0.02411, over 12102.00 frames. ], tot_loss[loss=0.1636, simple_loss=0.2536, pruned_loss=0.03677, over 2362563.88 frames. ], batch size: 33, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:09:47,936 INFO [finetune.py:992] (0/2) Epoch 19, batch 6650, loss[loss=0.1588, simple_loss=0.2589, pruned_loss=0.02934, over 12148.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2526, pruned_loss=0.03635, over 2368545.96 frames. ], batch size: 36, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:09:49,748 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.27 vs. limit=2.0 2023-05-19 00:09:57,661 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.8971, 4.7963, 4.7995, 4.8653, 3.8654, 5.0165, 4.9428, 5.0917], device='cuda:0'), covar=tensor([0.0302, 0.0223, 0.0234, 0.0396, 0.1343, 0.0436, 0.0214, 0.0255], device='cuda:0'), in_proj_covar=tensor([0.0208, 0.0209, 0.0200, 0.0258, 0.0250, 0.0232, 0.0188, 0.0243], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-19 00:09:59,145 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.7890, 2.3567, 3.5591, 2.9011, 3.4272, 2.9479, 2.2063, 3.4217], device='cuda:0'), covar=tensor([0.0253, 0.0560, 0.0254, 0.0345, 0.0190, 0.0304, 0.0622, 0.0190], device='cuda:0'), in_proj_covar=tensor([0.0194, 0.0219, 0.0205, 0.0201, 0.0234, 0.0181, 0.0211, 0.0206], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 00:10:01,130 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=326883.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:10:05,468 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0185, 3.6056, 5.4508, 2.9790, 2.9716, 4.2132, 3.4104, 4.1466], device='cuda:0'), covar=tensor([0.0407, 0.1115, 0.0258, 0.1105, 0.1980, 0.1205, 0.1321, 0.1053], device='cuda:0'), in_proj_covar=tensor([0.0246, 0.0244, 0.0271, 0.0191, 0.0243, 0.0302, 0.0232, 0.0279], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 00:10:07,473 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0621, 3.5902, 5.3867, 2.9340, 2.8824, 4.0843, 3.3585, 4.0199], device='cuda:0'), covar=tensor([0.0347, 0.1106, 0.0293, 0.1152, 0.2020, 0.1398, 0.1349, 0.1275], device='cuda:0'), in_proj_covar=tensor([0.0246, 0.0244, 0.0270, 0.0191, 0.0243, 0.0302, 0.0232, 0.0279], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 00:10:14,919 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.587e+02 3.102e+02 3.531e+02 1.010e+03, threshold=6.205e+02, percent-clipped=2.0 2023-05-19 00:10:18,802 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=2.85 vs. limit=5.0 2023-05-19 00:10:22,723 INFO [finetune.py:992] (0/2) Epoch 19, batch 6700, loss[loss=0.1538, simple_loss=0.2501, pruned_loss=0.02874, over 12191.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2523, pruned_loss=0.03615, over 2373657.83 frames. ], batch size: 35, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:10:34,968 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=326931.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:10:58,202 INFO [finetune.py:992] (0/2) Epoch 19, batch 6750, loss[loss=0.1767, simple_loss=0.2761, pruned_loss=0.03866, over 12184.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2526, pruned_loss=0.03623, over 2372015.68 frames. ], batch size: 35, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:11:16,630 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.3705, 4.7897, 3.1094, 2.6593, 4.0378, 2.7706, 4.0705, 3.3187], device='cuda:0'), covar=tensor([0.0687, 0.0543, 0.1095, 0.1672, 0.0350, 0.1319, 0.0487, 0.0816], device='cuda:0'), in_proj_covar=tensor([0.0194, 0.0268, 0.0184, 0.0208, 0.0149, 0.0190, 0.0208, 0.0181], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 00:11:25,746 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.626e+02 2.551e+02 2.862e+02 3.448e+02 6.454e+02, threshold=5.724e+02, percent-clipped=1.0 2023-05-19 00:11:27,299 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=327005.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:11:33,282 INFO [finetune.py:992] (0/2) Epoch 19, batch 6800, loss[loss=0.1579, simple_loss=0.2486, pruned_loss=0.03357, over 12149.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2522, pruned_loss=0.03605, over 2375624.31 frames. ], batch size: 34, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:12:08,701 INFO [finetune.py:992] (0/2) Epoch 19, batch 6850, loss[loss=0.1668, simple_loss=0.2502, pruned_loss=0.04174, over 12148.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2524, pruned_loss=0.03602, over 2379087.11 frames. ], batch size: 34, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:12:10,269 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=327066.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:12:22,206 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0276, 4.6962, 4.8246, 4.9520, 4.7351, 5.0209, 4.8745, 2.7763], device='cuda:0'), covar=tensor([0.0099, 0.0091, 0.0103, 0.0071, 0.0058, 0.0100, 0.0097, 0.0835], device='cuda:0'), in_proj_covar=tensor([0.0073, 0.0085, 0.0088, 0.0077, 0.0064, 0.0099, 0.0086, 0.0102], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 00:12:24,607 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.34 vs. limit=2.0 2023-05-19 00:12:35,627 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.752e+02 2.502e+02 3.064e+02 3.768e+02 7.870e+02, threshold=6.129e+02, percent-clipped=2.0 2023-05-19 00:12:43,157 INFO [finetune.py:992] (0/2) Epoch 19, batch 6900, loss[loss=0.1354, simple_loss=0.2178, pruned_loss=0.0265, over 12127.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2522, pruned_loss=0.03615, over 2375362.33 frames. ], batch size: 30, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:13:01,145 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.32 vs. limit=2.0 2023-05-19 00:13:06,401 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=327147.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:13:18,033 INFO [finetune.py:992] (0/2) Epoch 19, batch 6950, loss[loss=0.1445, simple_loss=0.2226, pruned_loss=0.0332, over 11987.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.2516, pruned_loss=0.0358, over 2377632.51 frames. ], batch size: 28, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:13:22,482 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.84 vs. limit=2.0 2023-05-19 00:13:45,928 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.614e+02 2.637e+02 3.078e+02 3.550e+02 1.130e+03, threshold=6.155e+02, percent-clipped=5.0 2023-05-19 00:13:49,672 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=327208.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:13:53,758 INFO [finetune.py:992] (0/2) Epoch 19, batch 7000, loss[loss=0.1691, simple_loss=0.2649, pruned_loss=0.03665, over 12077.00 frames. ], tot_loss[loss=0.162, simple_loss=0.2519, pruned_loss=0.03605, over 2373547.31 frames. ], batch size: 32, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:13:58,732 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.2315, 4.6055, 2.9649, 2.6160, 4.0279, 2.6686, 3.8927, 3.1351], device='cuda:0'), covar=tensor([0.0742, 0.0603, 0.1130, 0.1598, 0.0296, 0.1280, 0.0593, 0.0860], device='cuda:0'), in_proj_covar=tensor([0.0194, 0.0268, 0.0183, 0.0208, 0.0148, 0.0189, 0.0207, 0.0181], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 00:14:00,171 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.3233, 4.0652, 4.0812, 4.4456, 2.8337, 3.7610, 2.7142, 3.9602], device='cuda:0'), covar=tensor([0.1677, 0.0777, 0.0777, 0.0591, 0.1367, 0.0793, 0.1822, 0.1112], device='cuda:0'), in_proj_covar=tensor([0.0232, 0.0273, 0.0301, 0.0366, 0.0249, 0.0248, 0.0264, 0.0373], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-19 00:14:08,658 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.16 vs. limit=2.0 2023-05-19 00:14:16,137 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.4204, 2.1890, 3.7393, 4.3935, 3.7881, 4.2203, 3.9121, 2.9207], device='cuda:0'), covar=tensor([0.0054, 0.0553, 0.0149, 0.0048, 0.0147, 0.0111, 0.0116, 0.0440], device='cuda:0'), in_proj_covar=tensor([0.0091, 0.0123, 0.0105, 0.0083, 0.0105, 0.0118, 0.0103, 0.0138], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 00:14:28,996 INFO [finetune.py:992] (0/2) Epoch 19, batch 7050, loss[loss=0.169, simple_loss=0.2737, pruned_loss=0.03216, over 12141.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2522, pruned_loss=0.03632, over 2367808.99 frames. ], batch size: 34, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:14:47,013 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.2737, 5.1345, 5.1242, 5.1156, 4.7921, 5.2114, 5.2189, 5.4791], device='cuda:0'), covar=tensor([0.0263, 0.0168, 0.0166, 0.0376, 0.0759, 0.0547, 0.0151, 0.0163], device='cuda:0'), in_proj_covar=tensor([0.0208, 0.0210, 0.0201, 0.0260, 0.0252, 0.0234, 0.0189, 0.0244], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-19 00:14:47,659 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.2978, 4.9406, 5.3060, 4.5827, 4.9701, 4.6767, 5.2893, 5.0601], device='cuda:0'), covar=tensor([0.0350, 0.0386, 0.0309, 0.0297, 0.0477, 0.0344, 0.0301, 0.0296], device='cuda:0'), in_proj_covar=tensor([0.0284, 0.0287, 0.0313, 0.0281, 0.0281, 0.0283, 0.0258, 0.0229], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 00:14:55,923 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.682e+02 2.679e+02 2.925e+02 3.583e+02 6.371e+02, threshold=5.851e+02, percent-clipped=2.0 2023-05-19 00:15:03,662 INFO [finetune.py:992] (0/2) Epoch 19, batch 7100, loss[loss=0.1638, simple_loss=0.2581, pruned_loss=0.03475, over 12117.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.252, pruned_loss=0.03609, over 2373593.70 frames. ], batch size: 39, lr: 3.15e-03, grad_scale: 32.0 2023-05-19 00:15:25,825 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.6110, 5.4805, 5.0147, 5.0392, 5.5737, 4.8042, 5.0663, 5.0694], device='cuda:0'), covar=tensor([0.1548, 0.1017, 0.1283, 0.1904, 0.0936, 0.2204, 0.1883, 0.1322], device='cuda:0'), in_proj_covar=tensor([0.0373, 0.0525, 0.0426, 0.0471, 0.0486, 0.0465, 0.0423, 0.0406], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 00:15:31,830 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.37 vs. limit=2.0 2023-05-19 00:15:37,135 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=327361.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:15:39,156 INFO [finetune.py:992] (0/2) Epoch 19, batch 7150, loss[loss=0.1616, simple_loss=0.2645, pruned_loss=0.02931, over 12079.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2528, pruned_loss=0.0362, over 2367783.83 frames. ], batch size: 42, lr: 3.15e-03, grad_scale: 32.0 2023-05-19 00:15:40,510 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.55 vs. limit=5.0 2023-05-19 00:15:47,086 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.5965, 3.1704, 5.1135, 2.7789, 2.7109, 3.9289, 3.0192, 3.9857], device='cuda:0'), covar=tensor([0.0580, 0.1458, 0.0302, 0.1238, 0.2207, 0.1426, 0.1705, 0.1109], device='cuda:0'), in_proj_covar=tensor([0.0244, 0.0242, 0.0268, 0.0191, 0.0242, 0.0300, 0.0232, 0.0277], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 00:15:47,676 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([6.2075, 6.1079, 5.9408, 5.5136, 5.3430, 6.1141, 5.7094, 5.4821], device='cuda:0'), covar=tensor([0.0699, 0.1036, 0.0671, 0.1783, 0.0699, 0.0694, 0.1579, 0.0959], device='cuda:0'), in_proj_covar=tensor([0.0669, 0.0596, 0.0552, 0.0673, 0.0449, 0.0776, 0.0828, 0.0595], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0004, 0.0004, 0.0002], device='cuda:0') 2023-05-19 00:16:07,235 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.905e+02 2.644e+02 3.155e+02 3.771e+02 7.051e+02, threshold=6.311e+02, percent-clipped=3.0 2023-05-19 00:16:14,858 INFO [finetune.py:992] (0/2) Epoch 19, batch 7200, loss[loss=0.1692, simple_loss=0.2621, pruned_loss=0.03819, over 12206.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.2517, pruned_loss=0.03593, over 2375809.62 frames. ], batch size: 35, lr: 3.15e-03, grad_scale: 32.0 2023-05-19 00:16:49,217 INFO [finetune.py:992] (0/2) Epoch 19, batch 7250, loss[loss=0.178, simple_loss=0.2685, pruned_loss=0.0437, over 12115.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.2518, pruned_loss=0.0359, over 2380499.75 frames. ], batch size: 39, lr: 3.15e-03, grad_scale: 32.0 2023-05-19 00:17:17,152 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.676e+02 2.734e+02 3.338e+02 3.881e+02 7.331e+02, threshold=6.676e+02, percent-clipped=2.0 2023-05-19 00:17:17,240 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=327503.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:17:17,373 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.4762, 2.2883, 3.0077, 4.1362, 2.3877, 4.2581, 4.4435, 4.4133], device='cuda:0'), covar=tensor([0.0146, 0.1488, 0.0583, 0.0222, 0.1302, 0.0315, 0.0155, 0.0143], device='cuda:0'), in_proj_covar=tensor([0.0129, 0.0212, 0.0191, 0.0129, 0.0196, 0.0189, 0.0188, 0.0134], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0002], device='cuda:0') 2023-05-19 00:17:24,852 INFO [finetune.py:992] (0/2) Epoch 19, batch 7300, loss[loss=0.1425, simple_loss=0.2284, pruned_loss=0.02827, over 12102.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.2519, pruned_loss=0.03585, over 2382977.48 frames. ], batch size: 32, lr: 3.15e-03, grad_scale: 32.0 2023-05-19 00:17:25,049 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=327514.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:17:59,134 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=327562.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:18:00,402 INFO [finetune.py:992] (0/2) Epoch 19, batch 7350, loss[loss=0.1644, simple_loss=0.2649, pruned_loss=0.03191, over 12038.00 frames. ], tot_loss[loss=0.1609, simple_loss=0.2511, pruned_loss=0.03533, over 2390126.03 frames. ], batch size: 37, lr: 3.15e-03, grad_scale: 32.0 2023-05-19 00:18:07,968 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=327575.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:18:28,180 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.778e+02 2.443e+02 2.967e+02 3.435e+02 6.066e+02, threshold=5.934e+02, percent-clipped=0.0 2023-05-19 00:18:34,919 INFO [finetune.py:992] (0/2) Epoch 19, batch 7400, loss[loss=0.172, simple_loss=0.2595, pruned_loss=0.04226, over 12090.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.2511, pruned_loss=0.03563, over 2379803.08 frames. ], batch size: 32, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:18:41,333 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=327623.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:18:53,101 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.18 vs. limit=2.0 2023-05-19 00:19:06,778 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=3.58 vs. limit=5.0 2023-05-19 00:19:07,862 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=327661.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:19:09,777 INFO [finetune.py:992] (0/2) Epoch 19, batch 7450, loss[loss=0.1768, simple_loss=0.2707, pruned_loss=0.04149, over 11780.00 frames. ], tot_loss[loss=0.1615, simple_loss=0.2515, pruned_loss=0.0358, over 2378684.43 frames. ], batch size: 44, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:19:31,187 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.6906, 2.6920, 3.8000, 4.6281, 3.8831, 4.6385, 3.9714, 3.0977], device='cuda:0'), covar=tensor([0.0042, 0.0433, 0.0148, 0.0058, 0.0136, 0.0084, 0.0135, 0.0411], device='cuda:0'), in_proj_covar=tensor([0.0091, 0.0123, 0.0106, 0.0083, 0.0106, 0.0118, 0.0104, 0.0139], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 00:19:37,983 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.742e+02 2.581e+02 3.083e+02 3.815e+02 6.461e+02, threshold=6.167e+02, percent-clipped=2.0 2023-05-19 00:19:41,450 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=327709.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:19:44,892 INFO [finetune.py:992] (0/2) Epoch 19, batch 7500, loss[loss=0.1396, simple_loss=0.2251, pruned_loss=0.0271, over 12120.00 frames. ], tot_loss[loss=0.1613, simple_loss=0.2511, pruned_loss=0.03578, over 2369592.03 frames. ], batch size: 30, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:20:19,055 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.2119, 4.5706, 4.1846, 4.9184, 4.5330, 2.7913, 3.9957, 2.9664], device='cuda:0'), covar=tensor([0.0860, 0.0863, 0.1342, 0.0622, 0.1067, 0.1909, 0.1395, 0.3665], device='cuda:0'), in_proj_covar=tensor([0.0317, 0.0388, 0.0370, 0.0349, 0.0380, 0.0282, 0.0355, 0.0374], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 00:20:19,479 INFO [finetune.py:992] (0/2) Epoch 19, batch 7550, loss[loss=0.1722, simple_loss=0.2661, pruned_loss=0.03918, over 12362.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2527, pruned_loss=0.03637, over 2359579.43 frames. ], batch size: 35, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:20:32,833 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.0772, 2.5921, 3.6912, 3.0105, 3.4408, 3.1422, 2.6127, 3.5551], device='cuda:0'), covar=tensor([0.0152, 0.0396, 0.0140, 0.0281, 0.0157, 0.0200, 0.0376, 0.0133], device='cuda:0'), in_proj_covar=tensor([0.0195, 0.0218, 0.0206, 0.0202, 0.0234, 0.0181, 0.0211, 0.0207], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 00:20:36,502 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2023-05-19 00:20:47,830 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=327803.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:20:48,343 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.974e+02 2.705e+02 3.112e+02 3.698e+02 7.548e+02, threshold=6.224e+02, percent-clipped=1.0 2023-05-19 00:20:55,362 INFO [finetune.py:992] (0/2) Epoch 19, batch 7600, loss[loss=0.1599, simple_loss=0.2538, pruned_loss=0.03297, over 12091.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2522, pruned_loss=0.03597, over 2371892.15 frames. ], batch size: 42, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:21:16,944 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.2203, 6.1216, 5.7595, 5.6551, 6.1776, 5.5118, 5.6378, 5.6790], device='cuda:0'), covar=tensor([0.1540, 0.0821, 0.0957, 0.1640, 0.0804, 0.2017, 0.1763, 0.1161], device='cuda:0'), in_proj_covar=tensor([0.0373, 0.0524, 0.0423, 0.0469, 0.0482, 0.0462, 0.0420, 0.0405], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 00:21:21,675 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=327851.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:21:30,798 INFO [finetune.py:992] (0/2) Epoch 19, batch 7650, loss[loss=0.1493, simple_loss=0.2392, pruned_loss=0.02974, over 12294.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2522, pruned_loss=0.03604, over 2370547.38 frames. ], batch size: 33, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:21:31,896 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.27 vs. limit=2.0 2023-05-19 00:21:35,020 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=327870.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:21:42,395 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.37 vs. limit=2.0 2023-05-19 00:21:50,360 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.5467, 2.6062, 3.6697, 4.4433, 3.6690, 4.4304, 3.9236, 2.9180], device='cuda:0'), covar=tensor([0.0039, 0.0420, 0.0148, 0.0045, 0.0149, 0.0073, 0.0124, 0.0438], device='cuda:0'), in_proj_covar=tensor([0.0091, 0.0122, 0.0105, 0.0083, 0.0105, 0.0118, 0.0103, 0.0138], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 00:21:58,472 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.548e+02 2.582e+02 3.012e+02 3.558e+02 7.024e+02, threshold=6.024e+02, percent-clipped=1.0 2023-05-19 00:22:05,318 INFO [finetune.py:992] (0/2) Epoch 19, batch 7700, loss[loss=0.1557, simple_loss=0.2499, pruned_loss=0.03074, over 11617.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2523, pruned_loss=0.0361, over 2375894.47 frames. ], batch size: 48, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:22:08,175 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=327918.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:22:20,173 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.67 vs. limit=2.0 2023-05-19 00:22:25,610 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.4291, 4.9475, 4.3375, 5.1895, 4.6540, 2.8940, 4.1836, 3.1348], device='cuda:0'), covar=tensor([0.0778, 0.0648, 0.1356, 0.0530, 0.1040, 0.1759, 0.1191, 0.3225], device='cuda:0'), in_proj_covar=tensor([0.0318, 0.0390, 0.0373, 0.0350, 0.0381, 0.0284, 0.0357, 0.0376], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 00:22:40,563 INFO [finetune.py:992] (0/2) Epoch 19, batch 7750, loss[loss=0.1756, simple_loss=0.262, pruned_loss=0.04464, over 11624.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2527, pruned_loss=0.03629, over 2377987.56 frames. ], batch size: 48, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:22:59,503 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=327990.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:23:03,822 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=327996.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:23:06,606 INFO [checkpoint.py:75] (0/2) Saving checkpoint to pruned_transducer_stateless7/exp_giga_finetune/checkpoint-228000.pt 2023-05-19 00:23:10,609 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=328002.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:23:11,820 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.074e+02 2.697e+02 3.096e+02 3.856e+02 1.458e+03, threshold=6.191e+02, percent-clipped=3.0 2023-05-19 00:23:18,638 INFO [finetune.py:992] (0/2) Epoch 19, batch 7800, loss[loss=0.1496, simple_loss=0.2309, pruned_loss=0.03418, over 11821.00 frames. ], tot_loss[loss=0.162, simple_loss=0.2522, pruned_loss=0.03593, over 2382541.79 frames. ], batch size: 26, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:23:30,129 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=328030.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:23:44,597 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=328051.0, num_to_drop=1, layers_to_drop={3} 2023-05-19 00:23:48,854 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=328057.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:23:53,095 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=328063.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:23:53,583 INFO [finetune.py:992] (0/2) Epoch 19, batch 7850, loss[loss=0.1698, simple_loss=0.2639, pruned_loss=0.0378, over 12158.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2516, pruned_loss=0.03562, over 2387593.29 frames. ], batch size: 36, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:24:13,301 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=328091.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:24:15,953 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.3123, 5.1599, 5.2886, 5.3150, 4.9556, 5.0394, 4.7387, 5.2920], device='cuda:0'), covar=tensor([0.0787, 0.0696, 0.0934, 0.0700, 0.2066, 0.1267, 0.0654, 0.1112], device='cuda:0'), in_proj_covar=tensor([0.0577, 0.0757, 0.0662, 0.0678, 0.0903, 0.0791, 0.0605, 0.0516], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:0') 2023-05-19 00:24:22,347 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.646e+02 2.622e+02 2.990e+02 3.532e+02 1.143e+03, threshold=5.980e+02, percent-clipped=1.0 2023-05-19 00:24:29,371 INFO [finetune.py:992] (0/2) Epoch 19, batch 7900, loss[loss=0.1502, simple_loss=0.2292, pruned_loss=0.03567, over 12298.00 frames. ], tot_loss[loss=0.1609, simple_loss=0.251, pruned_loss=0.0354, over 2387631.22 frames. ], batch size: 28, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:25:04,409 INFO [finetune.py:992] (0/2) Epoch 19, batch 7950, loss[loss=0.1693, simple_loss=0.2543, pruned_loss=0.0422, over 12283.00 frames. ], tot_loss[loss=0.1608, simple_loss=0.251, pruned_loss=0.0353, over 2388871.66 frames. ], batch size: 33, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:25:08,674 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=328170.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:25:32,226 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.981e+02 2.594e+02 2.996e+02 3.774e+02 6.763e+02, threshold=5.992e+02, percent-clipped=1.0 2023-05-19 00:25:39,799 INFO [finetune.py:992] (0/2) Epoch 19, batch 8000, loss[loss=0.1607, simple_loss=0.2366, pruned_loss=0.04245, over 12279.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2519, pruned_loss=0.03644, over 2366421.05 frames. ], batch size: 28, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:25:42,572 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=328218.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:25:42,640 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=328218.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:26:14,227 INFO [finetune.py:992] (0/2) Epoch 19, batch 8050, loss[loss=0.1548, simple_loss=0.2386, pruned_loss=0.03554, over 12091.00 frames. ], tot_loss[loss=0.1636, simple_loss=0.2534, pruned_loss=0.03697, over 2363405.59 frames. ], batch size: 32, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:26:16,258 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=328266.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:26:42,323 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.898e+02 2.636e+02 3.070e+02 3.893e+02 1.409e+03, threshold=6.140e+02, percent-clipped=3.0 2023-05-19 00:26:49,282 INFO [finetune.py:992] (0/2) Epoch 19, batch 8100, loss[loss=0.1537, simple_loss=0.2394, pruned_loss=0.03393, over 12257.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2527, pruned_loss=0.0367, over 2374535.12 frames. ], batch size: 32, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:27:11,448 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=328346.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 00:27:16,242 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=328352.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:27:20,349 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=328358.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:27:24,464 INFO [finetune.py:992] (0/2) Epoch 19, batch 8150, loss[loss=0.1417, simple_loss=0.2291, pruned_loss=0.02709, over 12343.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.2528, pruned_loss=0.03665, over 2379628.87 frames. ], batch size: 31, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:27:39,905 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=328386.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:27:52,510 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.008e+02 2.617e+02 3.231e+02 3.569e+02 6.494e+02, threshold=6.462e+02, percent-clipped=2.0 2023-05-19 00:27:59,266 INFO [finetune.py:992] (0/2) Epoch 19, batch 8200, loss[loss=0.1656, simple_loss=0.2546, pruned_loss=0.03824, over 12053.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.253, pruned_loss=0.03688, over 2375696.63 frames. ], batch size: 40, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:28:18,145 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.78 vs. limit=2.0 2023-05-19 00:28:34,504 INFO [finetune.py:992] (0/2) Epoch 19, batch 8250, loss[loss=0.1794, simple_loss=0.2767, pruned_loss=0.04112, over 11740.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.2531, pruned_loss=0.03658, over 2379963.46 frames. ], batch size: 44, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:29:00,110 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.4568, 4.9872, 5.4256, 4.7415, 5.1022, 4.9039, 5.5087, 5.1094], device='cuda:0'), covar=tensor([0.0270, 0.0385, 0.0303, 0.0282, 0.0463, 0.0317, 0.0195, 0.0286], device='cuda:0'), in_proj_covar=tensor([0.0287, 0.0290, 0.0317, 0.0285, 0.0285, 0.0285, 0.0261, 0.0233], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 00:29:02,776 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.551e+02 2.709e+02 3.179e+02 3.720e+02 7.019e+02, threshold=6.358e+02, percent-clipped=3.0 2023-05-19 00:29:09,588 INFO [finetune.py:992] (0/2) Epoch 19, batch 8300, loss[loss=0.1476, simple_loss=0.2331, pruned_loss=0.03106, over 11998.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2532, pruned_loss=0.03643, over 2382307.71 frames. ], batch size: 28, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:29:43,859 INFO [finetune.py:992] (0/2) Epoch 19, batch 8350, loss[loss=0.1637, simple_loss=0.2642, pruned_loss=0.03154, over 12141.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2521, pruned_loss=0.036, over 2390044.58 frames. ], batch size: 36, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:29:59,231 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.3149, 4.5388, 2.7868, 2.1563, 3.9997, 2.2852, 3.7893, 3.1151], device='cuda:0'), covar=tensor([0.0726, 0.0704, 0.1340, 0.2205, 0.0323, 0.1732, 0.0588, 0.0880], device='cuda:0'), in_proj_covar=tensor([0.0196, 0.0271, 0.0185, 0.0212, 0.0149, 0.0190, 0.0209, 0.0182], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 00:30:12,370 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.605e+02 3.075e+02 3.770e+02 5.047e+02, threshold=6.151e+02, percent-clipped=0.0 2023-05-19 00:30:17,118 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.32 vs. limit=2.0 2023-05-19 00:30:19,351 INFO [finetune.py:992] (0/2) Epoch 19, batch 8400, loss[loss=0.1505, simple_loss=0.2378, pruned_loss=0.03163, over 12097.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2523, pruned_loss=0.03645, over 2377924.97 frames. ], batch size: 32, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:30:34,268 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.70 vs. limit=5.0 2023-05-19 00:30:42,126 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=328646.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:30:45,661 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.1056, 4.9106, 4.9647, 5.0634, 4.8913, 5.1134, 4.9960, 2.7637], device='cuda:0'), covar=tensor([0.0117, 0.0068, 0.0090, 0.0071, 0.0054, 0.0106, 0.0090, 0.0795], device='cuda:0'), in_proj_covar=tensor([0.0074, 0.0086, 0.0089, 0.0078, 0.0065, 0.0100, 0.0088, 0.0103], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 00:30:46,298 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=328652.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:30:50,393 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=328658.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:30:54,515 INFO [finetune.py:992] (0/2) Epoch 19, batch 8450, loss[loss=0.1763, simple_loss=0.266, pruned_loss=0.04331, over 12393.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2525, pruned_loss=0.03618, over 2378296.89 frames. ], batch size: 38, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:31:09,961 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=328686.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:31:15,266 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=328694.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:31:19,551 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=328700.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:31:22,245 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.713e+02 2.607e+02 3.087e+02 3.747e+02 7.122e+02, threshold=6.173e+02, percent-clipped=3.0 2023-05-19 00:31:23,690 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=328706.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:31:29,727 INFO [finetune.py:992] (0/2) Epoch 19, batch 8500, loss[loss=0.1535, simple_loss=0.234, pruned_loss=0.03648, over 12127.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2521, pruned_loss=0.03629, over 2374359.29 frames. ], batch size: 30, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:31:41,212 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=328730.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:31:43,939 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=328734.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:31:46,158 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.4893, 5.3033, 5.4098, 5.4432, 5.0803, 5.1002, 4.8729, 5.3501], device='cuda:0'), covar=tensor([0.0672, 0.0564, 0.0941, 0.0561, 0.1857, 0.1413, 0.0625, 0.1096], device='cuda:0'), in_proj_covar=tensor([0.0575, 0.0753, 0.0657, 0.0674, 0.0905, 0.0789, 0.0604, 0.0515], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:0') 2023-05-19 00:31:53,754 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=328748.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:32:04,466 INFO [finetune.py:992] (0/2) Epoch 19, batch 8550, loss[loss=0.1623, simple_loss=0.2529, pruned_loss=0.03585, over 12032.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2525, pruned_loss=0.03648, over 2372876.69 frames. ], batch size: 31, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:32:05,376 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.1274, 4.9639, 4.9784, 4.9169, 4.6848, 5.0570, 5.0695, 5.2196], device='cuda:0'), covar=tensor([0.0227, 0.0165, 0.0177, 0.0334, 0.0712, 0.0333, 0.0149, 0.0187], device='cuda:0'), in_proj_covar=tensor([0.0206, 0.0209, 0.0200, 0.0259, 0.0250, 0.0231, 0.0186, 0.0242], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-19 00:32:24,457 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=328791.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:32:26,572 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.4668, 3.2352, 4.8689, 2.5121, 2.6707, 3.5167, 3.0472, 3.5450], device='cuda:0'), covar=tensor([0.0529, 0.1303, 0.0389, 0.1330, 0.2138, 0.1792, 0.1556, 0.1563], device='cuda:0'), in_proj_covar=tensor([0.0244, 0.0242, 0.0268, 0.0190, 0.0242, 0.0300, 0.0231, 0.0277], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 00:32:33,596 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.478e+02 2.919e+02 3.511e+02 6.754e+02, threshold=5.838e+02, percent-clipped=1.0 2023-05-19 00:32:37,204 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=328809.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:32:40,606 INFO [finetune.py:992] (0/2) Epoch 19, batch 8600, loss[loss=0.1792, simple_loss=0.2778, pruned_loss=0.04029, over 12128.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.2514, pruned_loss=0.03618, over 2377594.29 frames. ], batch size: 42, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:32:50,556 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=328828.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:33:13,374 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=328861.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:33:15,793 INFO [finetune.py:992] (0/2) Epoch 19, batch 8650, loss[loss=0.132, simple_loss=0.2195, pruned_loss=0.0223, over 12266.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2521, pruned_loss=0.03626, over 2374281.71 frames. ], batch size: 28, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:33:20,088 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.4782, 4.8434, 3.0881, 2.4608, 4.2798, 2.4100, 4.1686, 3.3179], device='cuda:0'), covar=tensor([0.0702, 0.0661, 0.1260, 0.2243, 0.0298, 0.1875, 0.0490, 0.1046], device='cuda:0'), in_proj_covar=tensor([0.0195, 0.0268, 0.0183, 0.0210, 0.0148, 0.0189, 0.0207, 0.0181], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 00:33:33,164 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=328889.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:33:36,596 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([6.0372, 6.0517, 5.8010, 5.3840, 5.1802, 5.9755, 5.5355, 5.3352], device='cuda:0'), covar=tensor([0.0799, 0.0923, 0.0693, 0.1772, 0.0803, 0.0791, 0.1665, 0.1054], device='cuda:0'), in_proj_covar=tensor([0.0670, 0.0597, 0.0551, 0.0673, 0.0449, 0.0781, 0.0827, 0.0593], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0004, 0.0004, 0.0002], device='cuda:0') 2023-05-19 00:33:43,394 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.783e+02 2.567e+02 3.201e+02 3.541e+02 5.450e+02, threshold=6.402e+02, percent-clipped=0.0 2023-05-19 00:33:48,620 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=2.65 vs. limit=5.0 2023-05-19 00:33:50,353 INFO [finetune.py:992] (0/2) Epoch 19, batch 8700, loss[loss=0.1568, simple_loss=0.2569, pruned_loss=0.02834, over 12307.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.252, pruned_loss=0.03619, over 2372936.85 frames. ], batch size: 34, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:33:56,027 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.4851, 4.8809, 3.2286, 2.8970, 4.2447, 2.7223, 4.1494, 3.6318], device='cuda:0'), covar=tensor([0.0666, 0.0697, 0.1038, 0.1446, 0.0324, 0.1304, 0.0472, 0.0646], device='cuda:0'), in_proj_covar=tensor([0.0195, 0.0268, 0.0183, 0.0209, 0.0148, 0.0189, 0.0207, 0.0180], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 00:33:56,043 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=328922.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:34:25,540 INFO [finetune.py:992] (0/2) Epoch 19, batch 8750, loss[loss=0.1678, simple_loss=0.2621, pruned_loss=0.03674, over 12033.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.2515, pruned_loss=0.03615, over 2366475.91 frames. ], batch size: 40, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:34:35,220 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.84 vs. limit=2.0 2023-05-19 00:34:53,805 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.903e+02 2.542e+02 3.003e+02 3.580e+02 9.086e+02, threshold=6.006e+02, percent-clipped=1.0 2023-05-19 00:34:58,817 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.4513, 4.2646, 4.3024, 4.3507, 4.0451, 4.4945, 4.4407, 4.5858], device='cuda:0'), covar=tensor([0.0296, 0.0223, 0.0224, 0.0414, 0.0777, 0.0403, 0.0194, 0.0231], device='cuda:0'), in_proj_covar=tensor([0.0207, 0.0209, 0.0200, 0.0259, 0.0250, 0.0231, 0.0186, 0.0243], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-19 00:35:01,330 INFO [finetune.py:992] (0/2) Epoch 19, batch 8800, loss[loss=0.165, simple_loss=0.2638, pruned_loss=0.03313, over 12044.00 frames. ], tot_loss[loss=0.1613, simple_loss=0.2508, pruned_loss=0.03594, over 2369979.81 frames. ], batch size: 40, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:35:23,633 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.8490, 5.8337, 5.6056, 5.1429, 5.1315, 5.7541, 5.3526, 5.1580], device='cuda:0'), covar=tensor([0.0848, 0.1030, 0.0733, 0.1798, 0.0775, 0.0774, 0.1601, 0.0970], device='cuda:0'), in_proj_covar=tensor([0.0673, 0.0600, 0.0554, 0.0674, 0.0450, 0.0785, 0.0828, 0.0597], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0004, 0.0004, 0.0002], device='cuda:0') 2023-05-19 00:35:35,963 INFO [finetune.py:992] (0/2) Epoch 19, batch 8850, loss[loss=0.1657, simple_loss=0.2527, pruned_loss=0.0393, over 12057.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2509, pruned_loss=0.03598, over 2374701.33 frames. ], batch size: 40, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:35:45,114 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=329077.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:35:51,794 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=329086.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:36:04,365 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.789e+02 2.566e+02 3.001e+02 3.699e+02 8.491e+02, threshold=6.002e+02, percent-clipped=2.0 2023-05-19 00:36:04,449 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=329104.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:36:10,988 INFO [finetune.py:992] (0/2) Epoch 19, batch 8900, loss[loss=0.1728, simple_loss=0.2634, pruned_loss=0.0411, over 12126.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.2515, pruned_loss=0.03619, over 2372061.30 frames. ], batch size: 39, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:36:12,577 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.3924, 3.5039, 3.2580, 2.9800, 2.7936, 2.6398, 3.5302, 2.1890], device='cuda:0'), covar=tensor([0.0498, 0.0161, 0.0239, 0.0294, 0.0478, 0.0420, 0.0156, 0.0655], device='cuda:0'), in_proj_covar=tensor([0.0207, 0.0177, 0.0177, 0.0204, 0.0213, 0.0210, 0.0186, 0.0217], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 00:36:27,727 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=329138.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:36:46,162 INFO [finetune.py:992] (0/2) Epoch 19, batch 8950, loss[loss=0.1639, simple_loss=0.2495, pruned_loss=0.03915, over 12086.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.2516, pruned_loss=0.03608, over 2374000.07 frames. ], batch size: 32, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:36:58,054 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.2694, 5.1592, 5.0749, 5.0925, 4.7852, 5.1964, 5.2178, 5.3943], device='cuda:0'), covar=tensor([0.0282, 0.0162, 0.0196, 0.0338, 0.0808, 0.0348, 0.0192, 0.0205], device='cuda:0'), in_proj_covar=tensor([0.0207, 0.0209, 0.0200, 0.0259, 0.0249, 0.0231, 0.0186, 0.0243], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-19 00:37:00,125 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=329184.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:37:15,438 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.669e+02 2.793e+02 3.123e+02 3.747e+02 1.122e+03, threshold=6.245e+02, percent-clipped=3.0 2023-05-19 00:37:18,420 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.4921, 2.3645, 3.8597, 4.5209, 3.8718, 4.5443, 4.1216, 3.3344], device='cuda:0'), covar=tensor([0.0064, 0.0631, 0.0136, 0.0049, 0.0156, 0.0083, 0.0111, 0.0398], device='cuda:0'), in_proj_covar=tensor([0.0094, 0.0127, 0.0109, 0.0086, 0.0109, 0.0122, 0.0108, 0.0143], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 00:37:22,328 INFO [finetune.py:992] (0/2) Epoch 19, batch 9000, loss[loss=0.1721, simple_loss=0.2645, pruned_loss=0.03978, over 12194.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.2509, pruned_loss=0.03565, over 2376449.05 frames. ], batch size: 35, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:37:22,328 INFO [finetune.py:1017] (0/2) Computing validation loss 2023-05-19 00:37:39,915 INFO [finetune.py:1026] (0/2) Epoch 19, validation: loss=0.3217, simple_loss=0.3941, pruned_loss=0.1246, over 1020973.00 frames. 2023-05-19 00:37:39,916 INFO [finetune.py:1027] (0/2) Maximum memory allocated so far is 12525MB 2023-05-19 00:37:41,987 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=329217.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:37:46,977 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.4254, 3.4487, 3.2248, 2.9389, 2.7467, 2.6848, 3.4453, 2.2019], device='cuda:0'), covar=tensor([0.0480, 0.0208, 0.0221, 0.0285, 0.0496, 0.0424, 0.0191, 0.0601], device='cuda:0'), in_proj_covar=tensor([0.0205, 0.0175, 0.0175, 0.0203, 0.0211, 0.0208, 0.0184, 0.0215], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 00:38:14,254 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=329263.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:38:14,781 INFO [finetune.py:992] (0/2) Epoch 19, batch 9050, loss[loss=0.1638, simple_loss=0.2556, pruned_loss=0.03599, over 12349.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2518, pruned_loss=0.03657, over 2359277.60 frames. ], batch size: 36, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:38:18,419 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=329269.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:38:26,438 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=3.02 vs. limit=5.0 2023-05-19 00:38:42,483 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.989e+02 2.752e+02 3.154e+02 3.848e+02 9.009e+02, threshold=6.308e+02, percent-clipped=2.0 2023-05-19 00:38:49,335 INFO [finetune.py:992] (0/2) Epoch 19, batch 9100, loss[loss=0.1585, simple_loss=0.2518, pruned_loss=0.03264, over 11092.00 frames. ], tot_loss[loss=0.1635, simple_loss=0.2531, pruned_loss=0.03698, over 2363161.28 frames. ], batch size: 55, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:38:56,892 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=329324.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:39:01,183 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=329330.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:39:04,688 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.39 vs. limit=2.0 2023-05-19 00:39:07,726 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=329340.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 00:39:24,482 INFO [finetune.py:992] (0/2) Epoch 19, batch 9150, loss[loss=0.1612, simple_loss=0.26, pruned_loss=0.03118, over 12359.00 frames. ], tot_loss[loss=0.164, simple_loss=0.2538, pruned_loss=0.03711, over 2363267.56 frames. ], batch size: 36, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:39:36,431 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=329381.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:39:39,738 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=329386.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:39:51,163 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=329401.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 00:39:53,063 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.947e+02 2.594e+02 3.008e+02 3.738e+02 1.154e+03, threshold=6.015e+02, percent-clipped=2.0 2023-05-19 00:39:53,175 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=329404.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:39:59,913 INFO [finetune.py:992] (0/2) Epoch 19, batch 9200, loss[loss=0.1763, simple_loss=0.2619, pruned_loss=0.04536, over 12132.00 frames. ], tot_loss[loss=0.1643, simple_loss=0.254, pruned_loss=0.0373, over 2369108.58 frames. ], batch size: 38, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:40:13,462 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=329433.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:40:14,157 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=329434.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:40:19,781 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=329442.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:40:23,977 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.4691, 5.3255, 5.4075, 5.4793, 5.0731, 5.1524, 4.8634, 5.3324], device='cuda:0'), covar=tensor([0.0684, 0.0606, 0.0937, 0.0545, 0.2063, 0.1377, 0.0599, 0.1129], device='cuda:0'), in_proj_covar=tensor([0.0575, 0.0751, 0.0660, 0.0674, 0.0905, 0.0790, 0.0603, 0.0515], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:0') 2023-05-19 00:40:26,673 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=329452.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:40:34,961 INFO [finetune.py:992] (0/2) Epoch 19, batch 9250, loss[loss=0.1372, simple_loss=0.2232, pruned_loss=0.02565, over 12128.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.2535, pruned_loss=0.03668, over 2375283.93 frames. ], batch size: 30, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:40:49,762 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=329484.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:40:56,532 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([6.0608, 6.0558, 5.7023, 5.2635, 5.2104, 5.9308, 5.5292, 5.2512], device='cuda:0'), covar=tensor([0.0759, 0.0833, 0.0782, 0.1710, 0.0750, 0.0678, 0.1579, 0.1032], device='cuda:0'), in_proj_covar=tensor([0.0664, 0.0595, 0.0550, 0.0669, 0.0449, 0.0780, 0.0819, 0.0592], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0004, 0.0004, 0.0002], device='cuda:0') 2023-05-19 00:41:01,617 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=329501.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:41:02,367 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.8689, 3.4272, 5.2522, 2.6190, 3.0978, 3.9276, 3.3856, 3.8137], device='cuda:0'), covar=tensor([0.0399, 0.1154, 0.0366, 0.1274, 0.1774, 0.1552, 0.1315, 0.1324], device='cuda:0'), in_proj_covar=tensor([0.0244, 0.0242, 0.0269, 0.0189, 0.0241, 0.0300, 0.0230, 0.0276], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 00:41:03,523 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.877e+02 2.584e+02 2.998e+02 3.473e+02 7.449e+02, threshold=5.995e+02, percent-clipped=2.0 2023-05-19 00:41:10,362 INFO [finetune.py:992] (0/2) Epoch 19, batch 9300, loss[loss=0.1609, simple_loss=0.2477, pruned_loss=0.03708, over 12081.00 frames. ], tot_loss[loss=0.1635, simple_loss=0.2534, pruned_loss=0.03679, over 2374213.16 frames. ], batch size: 32, lr: 3.13e-03, grad_scale: 16.0 2023-05-19 00:41:12,685 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=329517.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:41:22,949 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=329532.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:41:23,158 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.1022, 4.4337, 3.9388, 4.9025, 4.4602, 2.7893, 4.0330, 3.0250], device='cuda:0'), covar=tensor([0.0999, 0.1067, 0.1740, 0.0630, 0.1251, 0.1960, 0.1376, 0.3563], device='cuda:0'), in_proj_covar=tensor([0.0319, 0.0393, 0.0374, 0.0352, 0.0383, 0.0284, 0.0359, 0.0376], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 00:41:30,542 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=329543.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:41:38,699 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=329554.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:41:44,298 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=329562.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:41:45,471 INFO [finetune.py:992] (0/2) Epoch 19, batch 9350, loss[loss=0.1667, simple_loss=0.2587, pruned_loss=0.03733, over 10534.00 frames. ], tot_loss[loss=0.1638, simple_loss=0.254, pruned_loss=0.03682, over 2372248.54 frames. ], batch size: 68, lr: 3.13e-03, grad_scale: 16.0 2023-05-19 00:41:46,227 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=329565.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:42:13,094 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.889e+02 2.643e+02 3.011e+02 3.895e+02 7.917e+02, threshold=6.022e+02, percent-clipped=5.0 2023-05-19 00:42:13,319 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=329604.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:42:16,794 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0914, 4.7657, 4.9092, 4.9644, 4.7263, 4.9762, 4.7953, 2.6620], device='cuda:0'), covar=tensor([0.0110, 0.0077, 0.0094, 0.0066, 0.0053, 0.0097, 0.0098, 0.0822], device='cuda:0'), in_proj_covar=tensor([0.0073, 0.0084, 0.0087, 0.0077, 0.0064, 0.0098, 0.0086, 0.0101], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 00:42:20,600 INFO [finetune.py:992] (0/2) Epoch 19, batch 9400, loss[loss=0.1623, simple_loss=0.257, pruned_loss=0.03374, over 11638.00 frames. ], tot_loss[loss=0.1636, simple_loss=0.2537, pruned_loss=0.03674, over 2366717.34 frames. ], batch size: 48, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:42:21,513 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=329615.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:42:24,141 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=329619.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:42:28,428 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=329625.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:42:32,826 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.0384, 2.2494, 2.3093, 2.2242, 2.0713, 2.0404, 2.1734, 1.8078], device='cuda:0'), covar=tensor([0.0389, 0.0208, 0.0282, 0.0231, 0.0386, 0.0287, 0.0259, 0.0453], device='cuda:0'), in_proj_covar=tensor([0.0204, 0.0174, 0.0175, 0.0201, 0.0208, 0.0206, 0.0183, 0.0213], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 00:42:55,268 INFO [finetune.py:992] (0/2) Epoch 19, batch 9450, loss[loss=0.1637, simple_loss=0.2586, pruned_loss=0.03443, over 12043.00 frames. ], tot_loss[loss=0.1635, simple_loss=0.2538, pruned_loss=0.03659, over 2368289.21 frames. ], batch size: 40, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:43:18,090 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=329696.0, num_to_drop=1, layers_to_drop={2} 2023-05-19 00:43:23,556 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.921e+02 2.436e+02 3.019e+02 3.891e+02 1.058e+03, threshold=6.038e+02, percent-clipped=2.0 2023-05-19 00:43:25,922 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.0262, 4.5064, 3.8417, 4.7804, 4.3105, 2.9310, 3.9452, 2.9907], device='cuda:0'), covar=tensor([0.0903, 0.0829, 0.1651, 0.0562, 0.1259, 0.1716, 0.1292, 0.3300], device='cuda:0'), in_proj_covar=tensor([0.0318, 0.0391, 0.0372, 0.0351, 0.0382, 0.0283, 0.0358, 0.0376], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 00:43:27,423 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=3.10 vs. limit=5.0 2023-05-19 00:43:30,472 INFO [finetune.py:992] (0/2) Epoch 19, batch 9500, loss[loss=0.1352, simple_loss=0.2153, pruned_loss=0.02755, over 12270.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2533, pruned_loss=0.03636, over 2370216.64 frames. ], batch size: 28, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:43:44,004 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=329733.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:43:46,727 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=329737.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:44:06,077 INFO [finetune.py:992] (0/2) Epoch 19, batch 9550, loss[loss=0.1657, simple_loss=0.2638, pruned_loss=0.0338, over 12062.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2528, pruned_loss=0.03592, over 2373739.51 frames. ], batch size: 42, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:44:18,104 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=329781.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:44:34,368 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.697e+02 2.446e+02 2.927e+02 3.662e+02 8.216e+02, threshold=5.853e+02, percent-clipped=4.0 2023-05-19 00:44:41,230 INFO [finetune.py:992] (0/2) Epoch 19, batch 9600, loss[loss=0.1447, simple_loss=0.2361, pruned_loss=0.02666, over 12242.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2527, pruned_loss=0.03569, over 2381096.24 frames. ], batch size: 32, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:45:11,904 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=329857.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:45:16,624 INFO [finetune.py:992] (0/2) Epoch 19, batch 9650, loss[loss=0.1551, simple_loss=0.2474, pruned_loss=0.03134, over 12358.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2531, pruned_loss=0.03599, over 2374107.67 frames. ], batch size: 36, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:45:17,478 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([6.0967, 6.0873, 5.7477, 5.2574, 5.3268, 5.9605, 5.6011, 5.3561], device='cuda:0'), covar=tensor([0.0675, 0.0802, 0.0657, 0.1896, 0.0725, 0.0748, 0.1583, 0.1024], device='cuda:0'), in_proj_covar=tensor([0.0665, 0.0598, 0.0551, 0.0671, 0.0450, 0.0781, 0.0819, 0.0595], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0004, 0.0004, 0.0002], device='cuda:0') 2023-05-19 00:45:19,506 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=329868.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:45:40,920 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=329899.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:45:44,310 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.030e+02 2.640e+02 3.056e+02 3.733e+02 8.200e+02, threshold=6.112e+02, percent-clipped=3.0 2023-05-19 00:45:49,139 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=329910.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:45:51,815 INFO [finetune.py:992] (0/2) Epoch 19, batch 9700, loss[loss=0.1871, simple_loss=0.274, pruned_loss=0.05011, over 8300.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2528, pruned_loss=0.03592, over 2377706.38 frames. ], batch size: 101, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:45:55,453 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=329919.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:45:59,631 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=329925.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:46:02,448 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=329929.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 00:46:26,512 INFO [finetune.py:992] (0/2) Epoch 19, batch 9750, loss[loss=0.1699, simple_loss=0.255, pruned_loss=0.04243, over 10361.00 frames. ], tot_loss[loss=0.162, simple_loss=0.2525, pruned_loss=0.03577, over 2373709.14 frames. ], batch size: 68, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:46:28,697 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=329967.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:46:32,839 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=329973.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:46:47,121 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=329993.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:46:49,166 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=329996.0, num_to_drop=1, layers_to_drop={2} 2023-05-19 00:46:52,027 INFO [checkpoint.py:75] (0/2) Saving checkpoint to pruned_transducer_stateless7/exp_giga_finetune/checkpoint-230000.pt 2023-05-19 00:46:57,545 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.820e+02 2.696e+02 3.076e+02 3.718e+02 7.389e+02, threshold=6.152e+02, percent-clipped=3.0 2023-05-19 00:47:04,573 INFO [finetune.py:992] (0/2) Epoch 19, batch 9800, loss[loss=0.1851, simple_loss=0.2876, pruned_loss=0.0413, over 12285.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.253, pruned_loss=0.03578, over 2369565.17 frames. ], batch size: 37, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:47:20,563 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=330037.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:47:25,395 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=330044.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 00:47:32,921 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=330054.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:47:39,701 INFO [finetune.py:992] (0/2) Epoch 19, batch 9850, loss[loss=0.1612, simple_loss=0.2558, pruned_loss=0.03327, over 12038.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2536, pruned_loss=0.03609, over 2369367.08 frames. ], batch size: 40, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:47:44,754 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.5814, 3.6432, 3.2569, 3.0936, 2.9250, 2.7161, 3.5875, 2.4484], device='cuda:0'), covar=tensor([0.0459, 0.0165, 0.0277, 0.0258, 0.0436, 0.0503, 0.0195, 0.0519], device='cuda:0'), in_proj_covar=tensor([0.0205, 0.0175, 0.0176, 0.0203, 0.0209, 0.0209, 0.0185, 0.0215], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 00:47:54,315 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=330085.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:48:07,189 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.714e+02 3.098e+02 3.571e+02 7.678e+02, threshold=6.196e+02, percent-clipped=2.0 2023-05-19 00:48:14,327 INFO [finetune.py:992] (0/2) Epoch 19, batch 9900, loss[loss=0.1864, simple_loss=0.2767, pruned_loss=0.04802, over 12085.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2529, pruned_loss=0.03598, over 2373330.07 frames. ], batch size: 42, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:48:23,136 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.2773, 2.6945, 3.7913, 3.2112, 3.6537, 3.2938, 2.9120, 3.6794], device='cuda:0'), covar=tensor([0.0159, 0.0402, 0.0187, 0.0278, 0.0139, 0.0227, 0.0387, 0.0153], device='cuda:0'), in_proj_covar=tensor([0.0194, 0.0215, 0.0205, 0.0200, 0.0232, 0.0179, 0.0209, 0.0205], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 00:48:45,330 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=330157.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:48:50,044 INFO [finetune.py:992] (0/2) Epoch 19, batch 9950, loss[loss=0.1761, simple_loss=0.2707, pruned_loss=0.04074, over 12049.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.2517, pruned_loss=0.03533, over 2376342.53 frames. ], batch size: 40, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:49:02,401 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.29 vs. limit=2.0 2023-05-19 00:49:10,557 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.6282, 3.6691, 3.3176, 3.1630, 2.8821, 2.7778, 3.6984, 2.4234], device='cuda:0'), covar=tensor([0.0437, 0.0186, 0.0214, 0.0267, 0.0475, 0.0429, 0.0179, 0.0531], device='cuda:0'), in_proj_covar=tensor([0.0205, 0.0175, 0.0176, 0.0203, 0.0209, 0.0209, 0.0185, 0.0215], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 00:49:15,306 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=330199.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:49:18,898 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.759e+02 2.491e+02 2.998e+02 3.513e+02 7.813e+02, threshold=5.996e+02, percent-clipped=2.0 2023-05-19 00:49:19,723 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=330205.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:49:23,198 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=330210.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:49:25,792 INFO [finetune.py:992] (0/2) Epoch 19, batch 10000, loss[loss=0.1727, simple_loss=0.2597, pruned_loss=0.04286, over 12114.00 frames. ], tot_loss[loss=0.1613, simple_loss=0.2516, pruned_loss=0.03549, over 2369139.43 frames. ], batch size: 38, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:49:32,636 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=330224.0, num_to_drop=1, layers_to_drop={3} 2023-05-19 00:49:48,500 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=330247.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:49:56,003 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=330258.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:50:00,217 INFO [finetune.py:992] (0/2) Epoch 19, batch 10050, loss[loss=0.1693, simple_loss=0.2565, pruned_loss=0.04101, over 12082.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2517, pruned_loss=0.0356, over 2367475.07 frames. ], batch size: 42, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:50:28,379 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.676e+02 2.491e+02 3.061e+02 3.700e+02 9.999e+02, threshold=6.122e+02, percent-clipped=4.0 2023-05-19 00:50:35,444 INFO [finetune.py:992] (0/2) Epoch 19, batch 10100, loss[loss=0.1298, simple_loss=0.2231, pruned_loss=0.01826, over 12042.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2516, pruned_loss=0.03557, over 2373584.84 frames. ], batch size: 31, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:50:51,375 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.6006, 3.5689, 3.2289, 3.0692, 2.7902, 2.7139, 3.5177, 2.3336], device='cuda:0'), covar=tensor([0.0428, 0.0164, 0.0230, 0.0265, 0.0433, 0.0396, 0.0148, 0.0573], device='cuda:0'), in_proj_covar=tensor([0.0206, 0.0175, 0.0177, 0.0204, 0.0210, 0.0210, 0.0185, 0.0215], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 00:51:00,230 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=330349.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:51:10,774 INFO [finetune.py:992] (0/2) Epoch 19, batch 10150, loss[loss=0.1381, simple_loss=0.232, pruned_loss=0.02207, over 12332.00 frames. ], tot_loss[loss=0.1609, simple_loss=0.2512, pruned_loss=0.03528, over 2374195.04 frames. ], batch size: 31, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:51:32,534 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=3.10 vs. limit=5.0 2023-05-19 00:51:38,896 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.105e+02 2.529e+02 2.869e+02 3.349e+02 8.693e+02, threshold=5.739e+02, percent-clipped=2.0 2023-05-19 00:51:45,825 INFO [finetune.py:992] (0/2) Epoch 19, batch 10200, loss[loss=0.1393, simple_loss=0.2367, pruned_loss=0.02097, over 12343.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.2514, pruned_loss=0.03545, over 2382495.46 frames. ], batch size: 35, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:51:50,969 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.3095, 4.7804, 4.1389, 5.0671, 4.4402, 2.8951, 4.1325, 3.0702], device='cuda:0'), covar=tensor([0.0835, 0.0745, 0.1645, 0.0452, 0.1457, 0.1735, 0.1212, 0.3436], device='cuda:0'), in_proj_covar=tensor([0.0312, 0.0385, 0.0366, 0.0345, 0.0375, 0.0278, 0.0352, 0.0369], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 00:52:21,102 INFO [finetune.py:992] (0/2) Epoch 19, batch 10250, loss[loss=0.1787, simple_loss=0.2614, pruned_loss=0.04797, over 11672.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2518, pruned_loss=0.03544, over 2384467.98 frames. ], batch size: 48, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:52:49,377 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.777e+02 2.666e+02 3.051e+02 3.730e+02 8.673e+02, threshold=6.102e+02, percent-clipped=3.0 2023-05-19 00:52:53,588 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0680, 6.0137, 5.7000, 5.5074, 6.0941, 5.4326, 5.3996, 5.5665], device='cuda:0'), covar=tensor([0.1652, 0.0926, 0.1004, 0.1886, 0.0860, 0.2108, 0.2341, 0.1207], device='cuda:0'), in_proj_covar=tensor([0.0372, 0.0521, 0.0427, 0.0471, 0.0480, 0.0465, 0.0423, 0.0407], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 00:52:56,312 INFO [finetune.py:992] (0/2) Epoch 19, batch 10300, loss[loss=0.1525, simple_loss=0.25, pruned_loss=0.02751, over 12151.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.2517, pruned_loss=0.03527, over 2387002.25 frames. ], batch size: 34, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:53:03,497 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=330524.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:53:08,411 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=330531.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:53:22,390 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.2393, 4.7749, 3.9462, 4.8994, 4.4283, 2.5161, 3.9346, 2.8416], device='cuda:0'), covar=tensor([0.0902, 0.0677, 0.1634, 0.0618, 0.1224, 0.2145, 0.1371, 0.3558], device='cuda:0'), in_proj_covar=tensor([0.0314, 0.0386, 0.0368, 0.0348, 0.0377, 0.0279, 0.0353, 0.0370], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 00:53:31,200 INFO [finetune.py:992] (0/2) Epoch 19, batch 10350, loss[loss=0.1627, simple_loss=0.2599, pruned_loss=0.0328, over 11523.00 frames. ], tot_loss[loss=0.1613, simple_loss=0.2519, pruned_loss=0.03535, over 2383487.15 frames. ], batch size: 48, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:53:36,787 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=330572.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:53:51,402 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=330592.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:54:00,003 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.899e+02 2.557e+02 3.095e+02 3.697e+02 6.958e+02, threshold=6.189e+02, percent-clipped=1.0 2023-05-19 00:54:06,962 INFO [finetune.py:992] (0/2) Epoch 19, batch 10400, loss[loss=0.1779, simple_loss=0.2703, pruned_loss=0.0428, over 12339.00 frames. ], tot_loss[loss=0.1615, simple_loss=0.2518, pruned_loss=0.03563, over 2383251.38 frames. ], batch size: 36, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:54:31,669 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=330649.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:54:33,429 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.44 vs. limit=5.0 2023-05-19 00:54:42,167 INFO [finetune.py:992] (0/2) Epoch 19, batch 10450, loss[loss=0.1597, simple_loss=0.2503, pruned_loss=0.03449, over 12290.00 frames. ], tot_loss[loss=0.1605, simple_loss=0.2505, pruned_loss=0.03519, over 2383542.35 frames. ], batch size: 34, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:55:05,161 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=330697.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:55:08,820 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.2669, 4.8584, 5.0272, 5.1145, 4.9248, 5.1237, 4.9304, 2.8343], device='cuda:0'), covar=tensor([0.0067, 0.0071, 0.0080, 0.0063, 0.0059, 0.0102, 0.0090, 0.0716], device='cuda:0'), in_proj_covar=tensor([0.0072, 0.0084, 0.0086, 0.0077, 0.0063, 0.0098, 0.0085, 0.0101], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 00:55:10,061 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.918e+02 2.534e+02 3.011e+02 3.548e+02 6.223e+02, threshold=6.023e+02, percent-clipped=1.0 2023-05-19 00:55:11,670 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.6592, 4.3198, 4.3464, 4.5350, 4.3923, 4.5683, 4.4132, 2.5907], device='cuda:0'), covar=tensor([0.0091, 0.0077, 0.0116, 0.0062, 0.0053, 0.0097, 0.0090, 0.0833], device='cuda:0'), in_proj_covar=tensor([0.0072, 0.0084, 0.0086, 0.0077, 0.0063, 0.0098, 0.0085, 0.0101], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 00:55:17,042 INFO [finetune.py:992] (0/2) Epoch 19, batch 10500, loss[loss=0.1407, simple_loss=0.2293, pruned_loss=0.02601, over 12191.00 frames. ], tot_loss[loss=0.1598, simple_loss=0.2498, pruned_loss=0.03491, over 2379353.81 frames. ], batch size: 31, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:55:52,437 INFO [finetune.py:992] (0/2) Epoch 19, batch 10550, loss[loss=0.1744, simple_loss=0.2718, pruned_loss=0.0385, over 12105.00 frames. ], tot_loss[loss=0.1597, simple_loss=0.2497, pruned_loss=0.03481, over 2389293.16 frames. ], batch size: 38, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:55:56,742 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.2307, 4.0329, 4.0834, 4.4950, 3.0472, 4.0123, 2.7980, 4.1428], device='cuda:0'), covar=tensor([0.1686, 0.0775, 0.0986, 0.0577, 0.1176, 0.0587, 0.1771, 0.1015], device='cuda:0'), in_proj_covar=tensor([0.0234, 0.0276, 0.0303, 0.0366, 0.0248, 0.0248, 0.0266, 0.0373], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-19 00:56:20,580 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.952e+02 2.559e+02 2.899e+02 3.575e+02 7.514e+02, threshold=5.797e+02, percent-clipped=2.0 2023-05-19 00:56:27,321 INFO [finetune.py:992] (0/2) Epoch 19, batch 10600, loss[loss=0.1575, simple_loss=0.2499, pruned_loss=0.0326, over 12151.00 frames. ], tot_loss[loss=0.1606, simple_loss=0.251, pruned_loss=0.03515, over 2388604.49 frames. ], batch size: 36, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:57:01,431 INFO [finetune.py:992] (0/2) Epoch 19, batch 10650, loss[loss=0.1298, simple_loss=0.2118, pruned_loss=0.02391, over 11979.00 frames. ], tot_loss[loss=0.1615, simple_loss=0.2518, pruned_loss=0.03558, over 2390553.85 frames. ], batch size: 28, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:57:18,533 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=330887.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:57:31,060 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.980e+02 2.601e+02 3.010e+02 3.765e+02 7.025e+02, threshold=6.019e+02, percent-clipped=3.0 2023-05-19 00:57:33,307 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.9188, 4.8089, 4.7543, 4.7841, 4.1419, 4.9988, 4.9722, 5.0725], device='cuda:0'), covar=tensor([0.0336, 0.0191, 0.0220, 0.0481, 0.1147, 0.0418, 0.0184, 0.0247], device='cuda:0'), in_proj_covar=tensor([0.0209, 0.0212, 0.0204, 0.0263, 0.0253, 0.0233, 0.0188, 0.0245], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-19 00:57:37,836 INFO [finetune.py:992] (0/2) Epoch 19, batch 10700, loss[loss=0.1845, simple_loss=0.2667, pruned_loss=0.05115, over 12097.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2517, pruned_loss=0.03553, over 2388000.31 frames. ], batch size: 32, lr: 3.13e-03, grad_scale: 16.0 2023-05-19 00:58:12,727 INFO [finetune.py:992] (0/2) Epoch 19, batch 10750, loss[loss=0.1519, simple_loss=0.2506, pruned_loss=0.02659, over 12144.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.2517, pruned_loss=0.03575, over 2380555.46 frames. ], batch size: 34, lr: 3.13e-03, grad_scale: 16.0 2023-05-19 00:58:13,651 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.1031, 2.4761, 3.6842, 3.0364, 3.4179, 3.1699, 2.6215, 3.5477], device='cuda:0'), covar=tensor([0.0165, 0.0426, 0.0164, 0.0283, 0.0222, 0.0216, 0.0380, 0.0193], device='cuda:0'), in_proj_covar=tensor([0.0197, 0.0219, 0.0209, 0.0203, 0.0236, 0.0183, 0.0212, 0.0209], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 00:58:41,597 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.046e+02 2.665e+02 3.088e+02 3.452e+02 7.597e+02, threshold=6.175e+02, percent-clipped=1.0 2023-05-19 00:58:42,651 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.21 vs. limit=2.0 2023-05-19 00:58:47,183 INFO [finetune.py:992] (0/2) Epoch 19, batch 10800, loss[loss=0.1378, simple_loss=0.2218, pruned_loss=0.02693, over 12280.00 frames. ], tot_loss[loss=0.1606, simple_loss=0.2505, pruned_loss=0.03536, over 2386565.93 frames. ], batch size: 28, lr: 3.13e-03, grad_scale: 8.0 2023-05-19 00:58:48,823 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.2654, 2.4152, 3.0741, 4.0602, 2.2923, 4.1547, 4.2398, 4.2987], device='cuda:0'), covar=tensor([0.0178, 0.1476, 0.0615, 0.0222, 0.1452, 0.0299, 0.0205, 0.0140], device='cuda:0'), in_proj_covar=tensor([0.0128, 0.0208, 0.0188, 0.0127, 0.0193, 0.0188, 0.0186, 0.0132], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:0') 2023-05-19 00:58:49,785 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.38 vs. limit=2.0 2023-05-19 00:58:52,234 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.3180, 4.6840, 2.9305, 2.7826, 3.9833, 2.4602, 3.9014, 3.2524], device='cuda:0'), covar=tensor([0.0735, 0.0473, 0.1281, 0.1547, 0.0401, 0.1543, 0.0557, 0.0837], device='cuda:0'), in_proj_covar=tensor([0.0195, 0.0269, 0.0183, 0.0209, 0.0150, 0.0189, 0.0208, 0.0181], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 00:58:58,113 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.61 vs. limit=2.0 2023-05-19 00:59:23,002 INFO [finetune.py:992] (0/2) Epoch 19, batch 10850, loss[loss=0.1715, simple_loss=0.2705, pruned_loss=0.03628, over 11072.00 frames. ], tot_loss[loss=0.1613, simple_loss=0.2509, pruned_loss=0.03582, over 2389066.01 frames. ], batch size: 55, lr: 3.13e-03, grad_scale: 8.0 2023-05-19 00:59:52,245 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.812e+02 2.639e+02 3.104e+02 3.812e+02 6.693e+02, threshold=6.207e+02, percent-clipped=2.0 2023-05-19 00:59:57,895 INFO [finetune.py:992] (0/2) Epoch 19, batch 10900, loss[loss=0.1342, simple_loss=0.2158, pruned_loss=0.02629, over 12334.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2518, pruned_loss=0.03622, over 2378672.75 frames. ], batch size: 30, lr: 3.13e-03, grad_scale: 8.0 2023-05-19 01:00:28,032 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.7801, 2.1821, 3.4767, 2.8638, 3.3506, 2.8722, 2.2137, 3.4501], device='cuda:0'), covar=tensor([0.0235, 0.0586, 0.0226, 0.0343, 0.0223, 0.0291, 0.0542, 0.0186], device='cuda:0'), in_proj_covar=tensor([0.0196, 0.0218, 0.0207, 0.0201, 0.0235, 0.0181, 0.0210, 0.0207], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 01:00:32,196 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.4918, 4.8836, 3.1939, 2.9728, 4.2001, 2.8872, 4.1026, 3.6139], device='cuda:0'), covar=tensor([0.0734, 0.0511, 0.1115, 0.1369, 0.0299, 0.1220, 0.0537, 0.0703], device='cuda:0'), in_proj_covar=tensor([0.0194, 0.0268, 0.0182, 0.0208, 0.0149, 0.0188, 0.0207, 0.0181], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 01:00:32,682 INFO [finetune.py:992] (0/2) Epoch 19, batch 10950, loss[loss=0.1452, simple_loss=0.2351, pruned_loss=0.02762, over 12175.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.2525, pruned_loss=0.03657, over 2374319.50 frames. ], batch size: 29, lr: 3.13e-03, grad_scale: 8.0 2023-05-19 01:00:48,519 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=331187.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:01:02,268 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.994e+02 2.828e+02 3.249e+02 3.909e+02 9.795e+02, threshold=6.498e+02, percent-clipped=5.0 2023-05-19 01:01:08,061 INFO [finetune.py:992] (0/2) Epoch 19, batch 11000, loss[loss=0.2197, simple_loss=0.3143, pruned_loss=0.06251, over 10502.00 frames. ], tot_loss[loss=0.1661, simple_loss=0.2556, pruned_loss=0.03827, over 2340821.72 frames. ], batch size: 68, lr: 3.13e-03, grad_scale: 8.0 2023-05-19 01:01:22,403 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=331235.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:01:42,330 INFO [finetune.py:992] (0/2) Epoch 19, batch 11050, loss[loss=0.2102, simple_loss=0.2793, pruned_loss=0.07054, over 8160.00 frames. ], tot_loss[loss=0.1668, simple_loss=0.2562, pruned_loss=0.03869, over 2309984.49 frames. ], batch size: 97, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:02:09,966 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=331302.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:02:10,313 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.51 vs. limit=2.0 2023-05-19 01:02:12,321 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.196e+02 2.758e+02 3.308e+02 4.257e+02 7.925e+02, threshold=6.616e+02, percent-clipped=3.0 2023-05-19 01:02:17,848 INFO [finetune.py:992] (0/2) Epoch 19, batch 11100, loss[loss=0.1687, simple_loss=0.2523, pruned_loss=0.04261, over 12193.00 frames. ], tot_loss[loss=0.1705, simple_loss=0.2599, pruned_loss=0.04058, over 2282409.65 frames. ], batch size: 31, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:02:52,112 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=331363.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:02:52,596 INFO [finetune.py:992] (0/2) Epoch 19, batch 11150, loss[loss=0.1794, simple_loss=0.2695, pruned_loss=0.04463, over 11137.00 frames. ], tot_loss[loss=0.1767, simple_loss=0.2657, pruned_loss=0.04385, over 2234644.71 frames. ], batch size: 55, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:03:21,550 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.988e+02 3.365e+02 3.844e+02 4.601e+02 6.844e+02, threshold=7.689e+02, percent-clipped=1.0 2023-05-19 01:03:26,714 INFO [finetune.py:992] (0/2) Epoch 19, batch 11200, loss[loss=0.2241, simple_loss=0.3148, pruned_loss=0.06671, over 10320.00 frames. ], tot_loss[loss=0.1849, simple_loss=0.2728, pruned_loss=0.04857, over 2154060.20 frames. ], batch size: 69, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:03:29,675 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0255, 4.9306, 5.0468, 5.0610, 4.7629, 4.8060, 4.6109, 4.9160], device='cuda:0'), covar=tensor([0.0834, 0.0593, 0.0813, 0.0620, 0.1928, 0.1445, 0.0600, 0.1258], device='cuda:0'), in_proj_covar=tensor([0.0573, 0.0748, 0.0657, 0.0671, 0.0898, 0.0785, 0.0604, 0.0516], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:0') 2023-05-19 01:03:52,757 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.3771, 3.1027, 3.1267, 3.4106, 2.5287, 3.1642, 2.6079, 2.7898], device='cuda:0'), covar=tensor([0.1435, 0.0856, 0.0867, 0.0551, 0.1091, 0.0721, 0.1643, 0.0526], device='cuda:0'), in_proj_covar=tensor([0.0228, 0.0269, 0.0296, 0.0358, 0.0244, 0.0243, 0.0262, 0.0365], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-19 01:04:01,269 INFO [finetune.py:992] (0/2) Epoch 19, batch 11250, loss[loss=0.2435, simple_loss=0.315, pruned_loss=0.08598, over 6778.00 frames. ], tot_loss[loss=0.1936, simple_loss=0.2801, pruned_loss=0.05358, over 2084704.44 frames. ], batch size: 98, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:04:09,677 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=5.18 vs. limit=5.0 2023-05-19 01:04:14,033 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.9267, 2.2545, 2.8972, 2.6118, 3.0151, 3.0236, 2.8496, 2.3849], device='cuda:0'), covar=tensor([0.0100, 0.0367, 0.0185, 0.0129, 0.0120, 0.0106, 0.0153, 0.0386], device='cuda:0'), in_proj_covar=tensor([0.0091, 0.0123, 0.0105, 0.0082, 0.0105, 0.0118, 0.0105, 0.0138], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 01:04:24,536 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.6708, 3.3623, 3.4577, 3.7151, 3.4117, 3.8217, 3.8244, 3.7834], device='cuda:0'), covar=tensor([0.0229, 0.0217, 0.0185, 0.0390, 0.0542, 0.0299, 0.0173, 0.0269], device='cuda:0'), in_proj_covar=tensor([0.0203, 0.0205, 0.0199, 0.0255, 0.0247, 0.0227, 0.0184, 0.0240], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-19 01:04:29,087 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.203e+02 3.419e+02 4.236e+02 5.254e+02 1.140e+03, threshold=8.472e+02, percent-clipped=4.0 2023-05-19 01:04:35,321 INFO [finetune.py:992] (0/2) Epoch 19, batch 11300, loss[loss=0.1624, simple_loss=0.2622, pruned_loss=0.03128, over 12066.00 frames. ], tot_loss[loss=0.2, simple_loss=0.2854, pruned_loss=0.05734, over 2024963.52 frames. ], batch size: 32, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:05:09,048 INFO [finetune.py:992] (0/2) Epoch 19, batch 11350, loss[loss=0.2616, simple_loss=0.3444, pruned_loss=0.08941, over 7288.00 frames. ], tot_loss[loss=0.205, simple_loss=0.2901, pruned_loss=0.05997, over 1969538.79 frames. ], batch size: 98, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:05:11,980 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.9810, 2.2137, 2.6822, 3.0253, 2.2230, 3.0914, 2.9395, 3.0881], device='cuda:0'), covar=tensor([0.0202, 0.1173, 0.0477, 0.0220, 0.1206, 0.0291, 0.0373, 0.0208], device='cuda:0'), in_proj_covar=tensor([0.0126, 0.0206, 0.0186, 0.0125, 0.0191, 0.0185, 0.0184, 0.0131], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 01:05:14,814 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.32 vs. limit=2.0 2023-05-19 01:05:31,253 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.6531, 3.2246, 3.4714, 3.6680, 3.6253, 3.6788, 3.5500, 2.7095], device='cuda:0'), covar=tensor([0.0097, 0.0167, 0.0176, 0.0083, 0.0073, 0.0138, 0.0085, 0.0759], device='cuda:0'), in_proj_covar=tensor([0.0072, 0.0085, 0.0087, 0.0077, 0.0064, 0.0099, 0.0086, 0.0102], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 01:05:37,792 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.593e+02 3.501e+02 4.003e+02 4.850e+02 7.916e+02, threshold=8.006e+02, percent-clipped=0.0 2023-05-19 01:05:39,149 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.7757, 4.4471, 4.0248, 4.1213, 4.5378, 3.9851, 4.1361, 3.9391], device='cuda:0'), covar=tensor([0.1604, 0.1115, 0.1746, 0.1820, 0.1052, 0.2043, 0.1627, 0.1315], device='cuda:0'), in_proj_covar=tensor([0.0360, 0.0505, 0.0412, 0.0452, 0.0466, 0.0448, 0.0409, 0.0391], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 01:05:43,088 INFO [finetune.py:992] (0/2) Epoch 19, batch 11400, loss[loss=0.2425, simple_loss=0.3209, pruned_loss=0.08202, over 6840.00 frames. ], tot_loss[loss=0.2102, simple_loss=0.2947, pruned_loss=0.06286, over 1908395.94 frames. ], batch size: 98, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:06:10,007 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=331654.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:06:12,597 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=331658.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:06:17,081 INFO [finetune.py:992] (0/2) Epoch 19, batch 11450, loss[loss=0.2359, simple_loss=0.3086, pruned_loss=0.08163, over 6673.00 frames. ], tot_loss[loss=0.213, simple_loss=0.2967, pruned_loss=0.06467, over 1887727.36 frames. ], batch size: 98, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:06:43,966 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.608e+02 3.484e+02 4.007e+02 4.586e+02 8.783e+02, threshold=8.014e+02, percent-clipped=3.0 2023-05-19 01:06:50,115 INFO [finetune.py:992] (0/2) Epoch 19, batch 11500, loss[loss=0.201, simple_loss=0.298, pruned_loss=0.05203, over 12351.00 frames. ], tot_loss[loss=0.2168, simple_loss=0.2997, pruned_loss=0.06696, over 1865443.01 frames. ], batch size: 38, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:06:50,937 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=331715.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:07:02,589 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.8030, 3.7912, 3.8120, 3.9089, 3.7226, 3.7856, 3.6274, 3.7690], device='cuda:0'), covar=tensor([0.1372, 0.0709, 0.1354, 0.0664, 0.1489, 0.1096, 0.0622, 0.1125], device='cuda:0'), in_proj_covar=tensor([0.0553, 0.0726, 0.0634, 0.0645, 0.0864, 0.0758, 0.0582, 0.0499], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0003, 0.0003], device='cuda:0') 2023-05-19 01:07:03,900 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=331734.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:07:06,630 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.21 vs. limit=2.0 2023-05-19 01:07:23,468 INFO [finetune.py:992] (0/2) Epoch 19, batch 11550, loss[loss=0.1969, simple_loss=0.2929, pruned_loss=0.05042, over 12336.00 frames. ], tot_loss[loss=0.2205, simple_loss=0.3024, pruned_loss=0.06935, over 1834466.73 frames. ], batch size: 36, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:07:35,424 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.8959, 2.9783, 4.4184, 2.6021, 2.4739, 3.5527, 2.9846, 3.6131], device='cuda:0'), covar=tensor([0.0687, 0.1503, 0.0221, 0.1357, 0.2263, 0.1362, 0.1689, 0.1083], device='cuda:0'), in_proj_covar=tensor([0.0235, 0.0234, 0.0257, 0.0184, 0.0232, 0.0287, 0.0224, 0.0264], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 01:07:45,127 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=331795.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:07:47,550 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=331799.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:07:52,108 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.317e+02 3.484e+02 3.907e+02 4.686e+02 6.736e+02, threshold=7.814e+02, percent-clipped=0.0 2023-05-19 01:07:57,276 INFO [finetune.py:992] (0/2) Epoch 19, batch 11600, loss[loss=0.2345, simple_loss=0.3231, pruned_loss=0.07293, over 10185.00 frames. ], tot_loss[loss=0.2217, simple_loss=0.3028, pruned_loss=0.07023, over 1807130.10 frames. ], batch size: 68, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:08:29,493 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=331860.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:08:32,135 INFO [finetune.py:992] (0/2) Epoch 19, batch 11650, loss[loss=0.1831, simple_loss=0.2788, pruned_loss=0.04371, over 12102.00 frames. ], tot_loss[loss=0.2224, simple_loss=0.3029, pruned_loss=0.07097, over 1773801.06 frames. ], batch size: 33, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:08:46,338 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.5246, 4.4822, 4.3557, 4.0704, 4.1019, 4.4809, 4.2675, 4.0766], device='cuda:0'), covar=tensor([0.0785, 0.0927, 0.0687, 0.1345, 0.2490, 0.0826, 0.1352, 0.1031], device='cuda:0'), in_proj_covar=tensor([0.0636, 0.0575, 0.0524, 0.0642, 0.0432, 0.0743, 0.0778, 0.0569], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002], device='cuda:0') 2023-05-19 01:09:00,617 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.560e+02 3.351e+02 3.903e+02 4.645e+02 7.287e+02, threshold=7.805e+02, percent-clipped=0.0 2023-05-19 01:09:06,511 INFO [finetune.py:992] (0/2) Epoch 19, batch 11700, loss[loss=0.2659, simple_loss=0.3252, pruned_loss=0.1033, over 6898.00 frames. ], tot_loss[loss=0.2212, simple_loss=0.3017, pruned_loss=0.0703, over 1769113.16 frames. ], batch size: 99, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:09:08,032 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.8051, 4.2119, 3.7238, 4.5082, 4.0155, 2.6109, 3.8107, 2.7979], device='cuda:0'), covar=tensor([0.1011, 0.0853, 0.1582, 0.0533, 0.1312, 0.2210, 0.1406, 0.3924], device='cuda:0'), in_proj_covar=tensor([0.0307, 0.0373, 0.0355, 0.0332, 0.0365, 0.0274, 0.0344, 0.0362], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-19 01:09:08,606 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=331917.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:09:35,950 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=331958.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:09:39,402 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.33 vs. limit=2.0 2023-05-19 01:09:39,687 INFO [finetune.py:992] (0/2) Epoch 19, batch 11750, loss[loss=0.2611, simple_loss=0.3152, pruned_loss=0.1035, over 6721.00 frames. ], tot_loss[loss=0.2218, simple_loss=0.3017, pruned_loss=0.07093, over 1739049.95 frames. ], batch size: 98, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:09:49,468 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=331978.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:09:53,328 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=331984.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:10:04,053 INFO [checkpoint.py:75] (0/2) Saving checkpoint to pruned_transducer_stateless7/exp_giga_finetune/checkpoint-232000.pt 2023-05-19 01:10:11,325 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.185e+02 3.441e+02 4.105e+02 4.910e+02 1.296e+03, threshold=8.210e+02, percent-clipped=2.0 2023-05-19 01:10:11,421 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=332006.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:10:14,080 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=332010.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:10:15,067 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.89 vs. limit=5.0 2023-05-19 01:10:16,510 INFO [finetune.py:992] (0/2) Epoch 19, batch 11800, loss[loss=0.208, simple_loss=0.2948, pruned_loss=0.06064, over 10209.00 frames. ], tot_loss[loss=0.2255, simple_loss=0.3051, pruned_loss=0.07296, over 1718560.23 frames. ], batch size: 68, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:10:27,649 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.50 vs. limit=2.0 2023-05-19 01:10:37,654 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=332045.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:10:49,833 INFO [finetune.py:992] (0/2) Epoch 19, batch 11850, loss[loss=0.1907, simple_loss=0.2914, pruned_loss=0.04505, over 11195.00 frames. ], tot_loss[loss=0.2278, simple_loss=0.3073, pruned_loss=0.0741, over 1692326.16 frames. ], batch size: 55, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:11:00,055 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.22 vs. limit=2.0 2023-05-19 01:11:07,975 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=332090.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:11:18,520 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.422e+02 3.367e+02 4.107e+02 5.003e+02 1.129e+03, threshold=8.213e+02, percent-clipped=1.0 2023-05-19 01:11:23,663 INFO [finetune.py:992] (0/2) Epoch 19, batch 11900, loss[loss=0.2871, simple_loss=0.3465, pruned_loss=0.1139, over 6865.00 frames. ], tot_loss[loss=0.2265, simple_loss=0.3068, pruned_loss=0.07306, over 1684808.29 frames. ], batch size: 97, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:11:35,240 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2023-05-19 01:11:48,573 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=332150.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:11:49,523 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.40 vs. limit=5.0 2023-05-19 01:11:51,737 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=332155.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:11:55,434 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=2.05 vs. limit=2.0 2023-05-19 01:11:57,575 INFO [finetune.py:992] (0/2) Epoch 19, batch 11950, loss[loss=0.1909, simple_loss=0.2735, pruned_loss=0.05409, over 6756.00 frames. ], tot_loss[loss=0.2211, simple_loss=0.3028, pruned_loss=0.06966, over 1687662.22 frames. ], batch size: 99, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:12:26,659 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.680e+02 2.981e+02 3.431e+02 3.993e+02 6.682e+02, threshold=6.862e+02, percent-clipped=0.0 2023-05-19 01:12:30,217 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=332211.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:12:32,087 INFO [finetune.py:992] (0/2) Epoch 19, batch 12000, loss[loss=0.1709, simple_loss=0.261, pruned_loss=0.04042, over 7166.00 frames. ], tot_loss[loss=0.2152, simple_loss=0.298, pruned_loss=0.06616, over 1689019.34 frames. ], batch size: 98, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:12:32,087 INFO [finetune.py:1017] (0/2) Computing validation loss 2023-05-19 01:12:39,493 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.5063, 4.2473, 4.4656, 4.1822, 4.2217, 4.2220, 4.4839, 4.5574], device='cuda:0'), covar=tensor([0.0433, 0.0471, 0.0409, 0.0254, 0.0573, 0.0421, 0.0316, 0.0268], device='cuda:0'), in_proj_covar=tensor([0.0268, 0.0269, 0.0292, 0.0264, 0.0265, 0.0263, 0.0241, 0.0215], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 01:12:47,179 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.7476, 2.0595, 2.7353, 2.7899, 2.7935, 2.8732, 2.7409, 2.2145], device='cuda:0'), covar=tensor([0.0128, 0.0356, 0.0187, 0.0107, 0.0151, 0.0108, 0.0155, 0.0376], device='cuda:0'), in_proj_covar=tensor([0.0089, 0.0119, 0.0102, 0.0079, 0.0102, 0.0114, 0.0101, 0.0134], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-19 01:12:49,555 INFO [finetune.py:1026] (0/2) Epoch 19, validation: loss=0.2852, simple_loss=0.36, pruned_loss=0.1052, over 1020973.00 frames. 2023-05-19 01:12:49,556 INFO [finetune.py:1027] (0/2) Maximum memory allocated so far is 12525MB 2023-05-19 01:13:23,350 INFO [finetune.py:992] (0/2) Epoch 19, batch 12050, loss[loss=0.199, simple_loss=0.2894, pruned_loss=0.05426, over 11219.00 frames. ], tot_loss[loss=0.211, simple_loss=0.2944, pruned_loss=0.06379, over 1672432.66 frames. ], batch size: 55, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:13:29,314 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=332273.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:13:49,767 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.177e+02 2.974e+02 3.380e+02 4.027e+02 9.522e+02, threshold=6.760e+02, percent-clipped=1.0 2023-05-19 01:13:52,271 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=332310.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:13:54,721 INFO [finetune.py:992] (0/2) Epoch 19, batch 12100, loss[loss=0.1878, simple_loss=0.2826, pruned_loss=0.04646, over 11818.00 frames. ], tot_loss[loss=0.2094, simple_loss=0.2933, pruned_loss=0.06275, over 1675843.67 frames. ], batch size: 44, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:14:04,835 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.18 vs. limit=2.0 2023-05-19 01:14:11,459 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=332340.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:14:15,716 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.65 vs. limit=2.0 2023-05-19 01:14:22,552 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=332358.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:14:26,257 INFO [finetune.py:992] (0/2) Epoch 19, batch 12150, loss[loss=0.2194, simple_loss=0.2947, pruned_loss=0.07201, over 7250.00 frames. ], tot_loss[loss=0.2111, simple_loss=0.2946, pruned_loss=0.06381, over 1672943.65 frames. ], batch size: 98, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:14:28,576 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.13 vs. limit=2.0 2023-05-19 01:14:42,990 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=332390.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:14:53,113 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.019e+02 3.104e+02 3.637e+02 4.356e+02 7.148e+02, threshold=7.274e+02, percent-clipped=3.0 2023-05-19 01:14:57,944 INFO [finetune.py:992] (0/2) Epoch 19, batch 12200, loss[loss=0.2378, simple_loss=0.3118, pruned_loss=0.08188, over 7025.00 frames. ], tot_loss[loss=0.2119, simple_loss=0.2953, pruned_loss=0.06422, over 1663816.49 frames. ], batch size: 97, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:15:12,356 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=332438.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:15:19,198 INFO [checkpoint.py:75] (0/2) Saving checkpoint to pruned_transducer_stateless7/exp_giga_finetune/epoch-19.pt 2023-05-19 01:15:38,553 INFO [finetune.py:992] (0/2) Epoch 20, batch 0, loss[loss=0.1426, simple_loss=0.2307, pruned_loss=0.02726, over 12034.00 frames. ], tot_loss[loss=0.1426, simple_loss=0.2307, pruned_loss=0.02726, over 12034.00 frames. ], batch size: 31, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:15:38,554 INFO [finetune.py:1017] (0/2) Computing validation loss 2023-05-19 01:15:54,186 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.9587, 4.6060, 4.9760, 4.5688, 4.7629, 4.6339, 4.9605, 4.9274], device='cuda:0'), covar=tensor([0.0370, 0.0388, 0.0264, 0.0249, 0.0411, 0.0352, 0.0241, 0.0163], device='cuda:0'), in_proj_covar=tensor([0.0264, 0.0266, 0.0287, 0.0261, 0.0262, 0.0259, 0.0238, 0.0212], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 01:15:54,734 INFO [finetune.py:1026] (0/2) Epoch 20, validation: loss=0.2858, simple_loss=0.3601, pruned_loss=0.1058, over 1020973.00 frames. 2023-05-19 01:15:54,734 INFO [finetune.py:1027] (0/2) Maximum memory allocated so far is 12525MB 2023-05-19 01:15:59,712 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=332455.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:16:30,248 INFO [finetune.py:992] (0/2) Epoch 20, batch 50, loss[loss=0.2154, simple_loss=0.2831, pruned_loss=0.07384, over 7519.00 frames. ], tot_loss[loss=0.171, simple_loss=0.2632, pruned_loss=0.03944, over 538907.53 frames. ], batch size: 97, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:16:33,727 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=332503.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:16:35,795 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.047e+02 2.897e+02 3.544e+02 4.170e+02 7.708e+02, threshold=7.088e+02, percent-clipped=2.0 2023-05-19 01:16:35,883 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=332506.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:16:45,023 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.0797, 2.5553, 3.7234, 3.1056, 3.4775, 3.2242, 2.7178, 3.5451], device='cuda:0'), covar=tensor([0.0198, 0.0490, 0.0184, 0.0334, 0.0210, 0.0249, 0.0434, 0.0189], device='cuda:0'), in_proj_covar=tensor([0.0180, 0.0202, 0.0188, 0.0184, 0.0214, 0.0165, 0.0196, 0.0190], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 01:16:48,398 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=332524.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 01:16:48,556 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.19 vs. limit=2.0 2023-05-19 01:16:58,138 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.9227, 2.5017, 3.5570, 2.9540, 3.3531, 3.1465, 2.5600, 3.4559], device='cuda:0'), covar=tensor([0.0205, 0.0449, 0.0204, 0.0314, 0.0208, 0.0250, 0.0450, 0.0171], device='cuda:0'), in_proj_covar=tensor([0.0180, 0.0202, 0.0189, 0.0185, 0.0214, 0.0165, 0.0196, 0.0191], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 01:17:03,708 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.6288, 2.7440, 4.6094, 4.7003, 2.8480, 2.5625, 2.8655, 2.0806], device='cuda:0'), covar=tensor([0.1940, 0.3303, 0.0489, 0.0494, 0.1476, 0.3090, 0.3249, 0.4969], device='cuda:0'), in_proj_covar=tensor([0.0310, 0.0392, 0.0280, 0.0304, 0.0280, 0.0324, 0.0406, 0.0381], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-19 01:17:04,755 INFO [finetune.py:992] (0/2) Epoch 20, batch 100, loss[loss=0.1999, simple_loss=0.2954, pruned_loss=0.05222, over 12293.00 frames. ], tot_loss[loss=0.1697, simple_loss=0.2617, pruned_loss=0.0389, over 949191.52 frames. ], batch size: 37, lr: 3.11e-03, grad_scale: 8.0 2023-05-19 01:17:22,855 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=332573.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:17:27,725 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0042, 4.7146, 4.7082, 4.7777, 4.6166, 4.9515, 4.7476, 2.3879], device='cuda:0'), covar=tensor([0.0108, 0.0070, 0.0113, 0.0067, 0.0070, 0.0107, 0.0085, 0.1009], device='cuda:0'), in_proj_covar=tensor([0.0071, 0.0083, 0.0086, 0.0076, 0.0063, 0.0097, 0.0084, 0.0101], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 01:17:31,136 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=332585.0, num_to_drop=1, layers_to_drop={3} 2023-05-19 01:17:39,757 INFO [finetune.py:992] (0/2) Epoch 20, batch 150, loss[loss=0.1724, simple_loss=0.2692, pruned_loss=0.03775, over 12128.00 frames. ], tot_loss[loss=0.1669, simple_loss=0.2589, pruned_loss=0.03749, over 1271300.31 frames. ], batch size: 39, lr: 3.11e-03, grad_scale: 8.0 2023-05-19 01:17:45,532 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.560e+02 2.484e+02 2.986e+02 3.369e+02 5.653e+02, threshold=5.971e+02, percent-clipped=0.0 2023-05-19 01:17:55,951 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.29 vs. limit=2.0 2023-05-19 01:17:56,335 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=332621.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:17:57,969 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.30 vs. limit=2.0 2023-05-19 01:18:08,504 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.7390, 2.9124, 3.3542, 4.5204, 2.5313, 4.5448, 4.6589, 4.7571], device='cuda:0'), covar=tensor([0.0105, 0.1257, 0.0523, 0.0159, 0.1443, 0.0234, 0.0155, 0.0093], device='cuda:0'), in_proj_covar=tensor([0.0123, 0.0204, 0.0182, 0.0122, 0.0189, 0.0179, 0.0178, 0.0127], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 01:18:09,124 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=332640.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:18:12,312 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.39 vs. limit=2.0 2023-05-19 01:18:14,482 INFO [finetune.py:992] (0/2) Epoch 20, batch 200, loss[loss=0.1845, simple_loss=0.283, pruned_loss=0.04294, over 12033.00 frames. ], tot_loss[loss=0.1673, simple_loss=0.2586, pruned_loss=0.03804, over 1521182.22 frames. ], batch size: 42, lr: 3.11e-03, grad_scale: 8.0 2023-05-19 01:18:42,198 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=332688.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:18:49,160 INFO [finetune.py:992] (0/2) Epoch 20, batch 250, loss[loss=0.164, simple_loss=0.2627, pruned_loss=0.03264, over 12099.00 frames. ], tot_loss[loss=0.1675, simple_loss=0.2587, pruned_loss=0.03817, over 1708970.81 frames. ], batch size: 33, lr: 3.11e-03, grad_scale: 8.0 2023-05-19 01:18:54,812 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.816e+02 2.640e+02 2.995e+02 3.621e+02 7.109e+02, threshold=5.991e+02, percent-clipped=2.0 2023-05-19 01:19:04,922 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=3.36 vs. limit=5.0 2023-05-19 01:19:24,208 INFO [finetune.py:992] (0/2) Epoch 20, batch 300, loss[loss=0.1631, simple_loss=0.2547, pruned_loss=0.03575, over 12156.00 frames. ], tot_loss[loss=0.1667, simple_loss=0.2574, pruned_loss=0.03803, over 1860483.27 frames. ], batch size: 34, lr: 3.11e-03, grad_scale: 8.0 2023-05-19 01:19:42,298 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.3662, 4.7245, 3.1005, 2.7023, 4.0688, 2.7646, 3.8559, 3.4137], device='cuda:0'), covar=tensor([0.0788, 0.0566, 0.1284, 0.1698, 0.0335, 0.1370, 0.0651, 0.0824], device='cuda:0'), in_proj_covar=tensor([0.0187, 0.0254, 0.0177, 0.0202, 0.0142, 0.0183, 0.0197, 0.0175], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 01:19:47,245 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.30 vs. limit=5.0 2023-05-19 01:19:49,911 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.7621, 2.8619, 4.7102, 4.7258, 2.7641, 2.6033, 2.8143, 2.1156], device='cuda:0'), covar=tensor([0.1832, 0.3438, 0.0449, 0.0481, 0.1499, 0.2817, 0.3333, 0.4738], device='cuda:0'), in_proj_covar=tensor([0.0309, 0.0392, 0.0279, 0.0304, 0.0280, 0.0324, 0.0406, 0.0380], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-19 01:19:59,226 INFO [finetune.py:992] (0/2) Epoch 20, batch 350, loss[loss=0.1617, simple_loss=0.2486, pruned_loss=0.03738, over 12113.00 frames. ], tot_loss[loss=0.167, simple_loss=0.2583, pruned_loss=0.03787, over 1975258.03 frames. ], batch size: 33, lr: 3.11e-03, grad_scale: 8.0 2023-05-19 01:20:01,781 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.2300, 3.4954, 3.5621, 3.9092, 2.8458, 3.4501, 2.6408, 3.4066], device='cuda:0'), covar=tensor([0.1743, 0.0921, 0.1018, 0.0757, 0.1204, 0.0776, 0.1857, 0.1133], device='cuda:0'), in_proj_covar=tensor([0.0232, 0.0271, 0.0296, 0.0357, 0.0245, 0.0245, 0.0265, 0.0368], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-19 01:20:05,004 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.654e+02 2.629e+02 3.196e+02 3.896e+02 7.714e+02, threshold=6.392e+02, percent-clipped=2.0 2023-05-19 01:20:05,130 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=332806.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:20:21,544 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=332829.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:20:26,545 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.6003, 2.4230, 4.6187, 4.8881, 2.8441, 2.5324, 2.7635, 2.0049], device='cuda:0'), covar=tensor([0.2114, 0.4282, 0.0565, 0.0398, 0.1481, 0.3178, 0.3773, 0.5937], device='cuda:0'), in_proj_covar=tensor([0.0311, 0.0395, 0.0281, 0.0306, 0.0282, 0.0326, 0.0408, 0.0382], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-19 01:20:34,313 INFO [finetune.py:992] (0/2) Epoch 20, batch 400, loss[loss=0.1584, simple_loss=0.247, pruned_loss=0.03485, over 12163.00 frames. ], tot_loss[loss=0.1653, simple_loss=0.2562, pruned_loss=0.0372, over 2072681.50 frames. ], batch size: 34, lr: 3.11e-03, grad_scale: 8.0 2023-05-19 01:20:38,542 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=332854.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:20:41,454 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.4988, 5.0306, 5.5290, 4.8389, 5.1984, 4.9350, 5.5702, 5.1462], device='cuda:0'), covar=tensor([0.0267, 0.0383, 0.0242, 0.0246, 0.0425, 0.0353, 0.0186, 0.0292], device='cuda:0'), in_proj_covar=tensor([0.0273, 0.0276, 0.0298, 0.0270, 0.0272, 0.0269, 0.0246, 0.0220], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 01:20:57,342 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=332880.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 01:21:04,400 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=332890.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:21:07,022 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.5721, 4.3213, 4.3620, 4.4748, 4.3383, 4.5166, 4.3616, 2.7285], device='cuda:0'), covar=tensor([0.0122, 0.0090, 0.0122, 0.0088, 0.0072, 0.0126, 0.0111, 0.0880], device='cuda:0'), in_proj_covar=tensor([0.0073, 0.0084, 0.0088, 0.0077, 0.0064, 0.0099, 0.0086, 0.0103], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 01:21:09,579 INFO [finetune.py:992] (0/2) Epoch 20, batch 450, loss[loss=0.1703, simple_loss=0.2649, pruned_loss=0.03782, over 12108.00 frames. ], tot_loss[loss=0.1649, simple_loss=0.2553, pruned_loss=0.03723, over 2148559.40 frames. ], batch size: 33, lr: 3.11e-03, grad_scale: 8.0 2023-05-19 01:21:11,528 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.71 vs. limit=5.0 2023-05-19 01:21:14,989 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.777e+02 2.551e+02 3.007e+02 3.557e+02 1.238e+03, threshold=6.013e+02, percent-clipped=1.0 2023-05-19 01:21:44,417 INFO [finetune.py:992] (0/2) Epoch 20, batch 500, loss[loss=0.1612, simple_loss=0.2484, pruned_loss=0.03694, over 12258.00 frames. ], tot_loss[loss=0.1653, simple_loss=0.2557, pruned_loss=0.03751, over 2201692.22 frames. ], batch size: 32, lr: 3.11e-03, grad_scale: 8.0 2023-05-19 01:22:05,335 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.38 vs. limit=2.0 2023-05-19 01:22:19,332 INFO [finetune.py:992] (0/2) Epoch 20, batch 550, loss[loss=0.1691, simple_loss=0.2617, pruned_loss=0.03827, over 12256.00 frames. ], tot_loss[loss=0.1643, simple_loss=0.2549, pruned_loss=0.0369, over 2243904.63 frames. ], batch size: 32, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:22:22,576 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=333002.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 01:22:25,190 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.899e+02 2.622e+02 3.186e+02 3.720e+02 6.953e+02, threshold=6.373e+02, percent-clipped=2.0 2023-05-19 01:22:30,837 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.9870, 4.7998, 4.7918, 4.7812, 4.4848, 4.9528, 4.9579, 5.1429], device='cuda:0'), covar=tensor([0.0235, 0.0183, 0.0206, 0.0473, 0.0772, 0.0327, 0.0172, 0.0205], device='cuda:0'), in_proj_covar=tensor([0.0197, 0.0200, 0.0193, 0.0245, 0.0238, 0.0220, 0.0178, 0.0234], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:0') 2023-05-19 01:22:40,563 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=333027.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:22:54,908 INFO [finetune.py:992] (0/2) Epoch 20, batch 600, loss[loss=0.1491, simple_loss=0.244, pruned_loss=0.02706, over 12293.00 frames. ], tot_loss[loss=0.1635, simple_loss=0.2542, pruned_loss=0.03641, over 2283591.61 frames. ], batch size: 33, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:23:05,529 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=333063.0, num_to_drop=1, layers_to_drop={3} 2023-05-19 01:23:05,577 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.1920, 4.6056, 4.0659, 4.9902, 4.5627, 3.0192, 4.2110, 3.0709], device='cuda:0'), covar=tensor([0.0954, 0.0885, 0.1675, 0.0625, 0.1223, 0.1820, 0.1257, 0.3465], device='cuda:0'), in_proj_covar=tensor([0.0318, 0.0386, 0.0369, 0.0346, 0.0380, 0.0284, 0.0358, 0.0377], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 01:23:23,243 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=333088.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:23:26,261 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.38 vs. limit=2.0 2023-05-19 01:23:30,123 INFO [finetune.py:992] (0/2) Epoch 20, batch 650, loss[loss=0.1652, simple_loss=0.2683, pruned_loss=0.03106, over 12191.00 frames. ], tot_loss[loss=0.1636, simple_loss=0.2545, pruned_loss=0.0363, over 2316550.92 frames. ], batch size: 35, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:23:35,424 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.653e+02 3.049e+02 3.616e+02 6.869e+02, threshold=6.098e+02, percent-clipped=2.0 2023-05-19 01:24:04,336 INFO [finetune.py:992] (0/2) Epoch 20, batch 700, loss[loss=0.1316, simple_loss=0.2177, pruned_loss=0.02271, over 12335.00 frames. ], tot_loss[loss=0.1633, simple_loss=0.2539, pruned_loss=0.03635, over 2332216.87 frames. ], batch size: 30, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:24:26,033 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=333180.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 01:24:30,018 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=333185.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:24:38,805 INFO [finetune.py:992] (0/2) Epoch 20, batch 750, loss[loss=0.1701, simple_loss=0.2617, pruned_loss=0.03922, over 12343.00 frames. ], tot_loss[loss=0.162, simple_loss=0.2526, pruned_loss=0.0357, over 2353034.65 frames. ], batch size: 35, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:24:44,366 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.998e+02 2.641e+02 2.899e+02 3.426e+02 5.740e+02, threshold=5.798e+02, percent-clipped=0.0 2023-05-19 01:24:56,044 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=333222.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:24:59,969 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=333228.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 01:25:13,940 INFO [finetune.py:992] (0/2) Epoch 20, batch 800, loss[loss=0.1481, simple_loss=0.2318, pruned_loss=0.03224, over 12335.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2527, pruned_loss=0.03608, over 2346985.47 frames. ], batch size: 30, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:25:24,538 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=3.36 vs. limit=5.0 2023-05-19 01:25:37,930 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=333283.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:25:48,110 INFO [finetune.py:992] (0/2) Epoch 20, batch 850, loss[loss=0.1469, simple_loss=0.237, pruned_loss=0.02841, over 12297.00 frames. ], tot_loss[loss=0.162, simple_loss=0.2521, pruned_loss=0.03593, over 2359734.50 frames. ], batch size: 33, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:25:53,810 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.586e+02 2.626e+02 3.010e+02 3.698e+02 7.947e+02, threshold=6.019e+02, percent-clipped=2.0 2023-05-19 01:26:10,404 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.51 vs. limit=5.0 2023-05-19 01:26:20,439 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.1628, 2.6928, 3.6482, 3.0895, 3.4977, 3.3055, 2.6885, 3.5867], device='cuda:0'), covar=tensor([0.0154, 0.0391, 0.0153, 0.0260, 0.0182, 0.0201, 0.0406, 0.0164], device='cuda:0'), in_proj_covar=tensor([0.0188, 0.0210, 0.0198, 0.0193, 0.0225, 0.0173, 0.0204, 0.0199], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 01:26:23,802 INFO [finetune.py:992] (0/2) Epoch 20, batch 900, loss[loss=0.1478, simple_loss=0.2348, pruned_loss=0.03038, over 12003.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.2514, pruned_loss=0.03549, over 2366813.68 frames. ], batch size: 28, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:26:31,019 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=333358.0, num_to_drop=1, layers_to_drop={2} 2023-05-19 01:26:48,952 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=333383.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:26:59,196 INFO [finetune.py:992] (0/2) Epoch 20, batch 950, loss[loss=0.1736, simple_loss=0.2582, pruned_loss=0.04448, over 12105.00 frames. ], tot_loss[loss=0.1607, simple_loss=0.2506, pruned_loss=0.03543, over 2374488.12 frames. ], batch size: 42, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:27:05,027 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.539e+02 2.535e+02 2.875e+02 3.415e+02 5.071e+02, threshold=5.750e+02, percent-clipped=0.0 2023-05-19 01:27:05,197 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=333406.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:27:34,130 INFO [finetune.py:992] (0/2) Epoch 20, batch 1000, loss[loss=0.171, simple_loss=0.2617, pruned_loss=0.04015, over 11166.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.2516, pruned_loss=0.03582, over 2374229.04 frames. ], batch size: 55, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:27:47,320 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=333467.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:28:00,534 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=333485.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:28:09,363 INFO [finetune.py:992] (0/2) Epoch 20, batch 1050, loss[loss=0.1579, simple_loss=0.2394, pruned_loss=0.03821, over 12104.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2515, pruned_loss=0.03564, over 2373690.58 frames. ], batch size: 30, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:28:14,965 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.701e+02 2.537e+02 2.902e+02 3.628e+02 7.311e+02, threshold=5.804e+02, percent-clipped=3.0 2023-05-19 01:28:21,238 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0273, 5.9369, 5.5497, 5.4292, 6.0214, 5.2153, 5.3872, 5.4839], device='cuda:0'), covar=tensor([0.1631, 0.0939, 0.1104, 0.1841, 0.0911, 0.2344, 0.2311, 0.1177], device='cuda:0'), in_proj_covar=tensor([0.0364, 0.0511, 0.0416, 0.0461, 0.0468, 0.0449, 0.0409, 0.0395], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 01:28:34,553 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=333533.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:28:44,731 INFO [finetune.py:992] (0/2) Epoch 20, batch 1100, loss[loss=0.1675, simple_loss=0.2579, pruned_loss=0.03861, over 12099.00 frames. ], tot_loss[loss=0.162, simple_loss=0.2522, pruned_loss=0.0359, over 2373105.92 frames. ], batch size: 32, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:29:05,929 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=333578.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:29:19,793 INFO [finetune.py:992] (0/2) Epoch 20, batch 1150, loss[loss=0.1509, simple_loss=0.2425, pruned_loss=0.02968, over 12179.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.2522, pruned_loss=0.03567, over 2377537.14 frames. ], batch size: 31, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:29:25,534 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.732e+02 2.683e+02 3.097e+02 3.738e+02 5.412e+02, threshold=6.193e+02, percent-clipped=0.0 2023-05-19 01:29:55,552 INFO [finetune.py:992] (0/2) Epoch 20, batch 1200, loss[loss=0.1704, simple_loss=0.2692, pruned_loss=0.03582, over 12163.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.252, pruned_loss=0.03554, over 2384335.39 frames. ], batch size: 36, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:30:02,741 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=333658.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 01:30:07,898 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.39 vs. limit=5.0 2023-05-19 01:30:14,091 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.76 vs. limit=5.0 2023-05-19 01:30:20,514 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=333683.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:30:30,970 INFO [finetune.py:992] (0/2) Epoch 20, batch 1250, loss[loss=0.1602, simple_loss=0.2568, pruned_loss=0.03183, over 12144.00 frames. ], tot_loss[loss=0.1604, simple_loss=0.2511, pruned_loss=0.03488, over 2395572.25 frames. ], batch size: 34, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:30:36,594 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.693e+02 2.535e+02 2.875e+02 3.333e+02 5.820e+02, threshold=5.749e+02, percent-clipped=0.0 2023-05-19 01:30:36,672 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=333706.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 01:30:53,897 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=333731.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:31:05,869 INFO [finetune.py:992] (0/2) Epoch 20, batch 1300, loss[loss=0.1634, simple_loss=0.2529, pruned_loss=0.03698, over 12088.00 frames. ], tot_loss[loss=0.1605, simple_loss=0.2512, pruned_loss=0.0349, over 2399148.19 frames. ], batch size: 32, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:31:15,453 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=333762.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:31:32,790 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.6958, 2.7010, 4.3619, 4.4311, 2.8100, 2.5328, 3.0139, 2.1900], device='cuda:0'), covar=tensor([0.1869, 0.3381, 0.0562, 0.0518, 0.1518, 0.2919, 0.2854, 0.4338], device='cuda:0'), in_proj_covar=tensor([0.0317, 0.0401, 0.0285, 0.0311, 0.0286, 0.0332, 0.0413, 0.0388], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-19 01:31:33,516 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.47 vs. limit=2.0 2023-05-19 01:31:40,841 INFO [finetune.py:992] (0/2) Epoch 20, batch 1350, loss[loss=0.1666, simple_loss=0.2658, pruned_loss=0.03374, over 11837.00 frames. ], tot_loss[loss=0.1613, simple_loss=0.2522, pruned_loss=0.03518, over 2399138.05 frames. ], batch size: 44, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:31:46,704 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.856e+02 2.512e+02 2.833e+02 3.283e+02 6.251e+02, threshold=5.665e+02, percent-clipped=1.0 2023-05-19 01:32:01,398 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.4247, 3.5746, 3.2207, 3.0003, 2.7941, 2.6761, 3.4616, 2.2369], device='cuda:0'), covar=tensor([0.0478, 0.0149, 0.0220, 0.0248, 0.0487, 0.0412, 0.0168, 0.0597], device='cuda:0'), in_proj_covar=tensor([0.0202, 0.0169, 0.0175, 0.0198, 0.0207, 0.0206, 0.0180, 0.0213], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 01:32:16,461 INFO [finetune.py:992] (0/2) Epoch 20, batch 1400, loss[loss=0.1758, simple_loss=0.2613, pruned_loss=0.04514, over 12037.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.2518, pruned_loss=0.03515, over 2394557.62 frames. ], batch size: 42, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:32:16,643 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.2811, 5.1505, 5.0907, 5.1099, 4.8205, 5.2431, 5.2769, 5.3695], device='cuda:0'), covar=tensor([0.0254, 0.0160, 0.0213, 0.0355, 0.0743, 0.0338, 0.0179, 0.0191], device='cuda:0'), in_proj_covar=tensor([0.0204, 0.0206, 0.0199, 0.0255, 0.0247, 0.0228, 0.0185, 0.0243], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-19 01:32:37,603 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=333878.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:32:51,690 INFO [finetune.py:992] (0/2) Epoch 20, batch 1450, loss[loss=0.1701, simple_loss=0.2593, pruned_loss=0.04042, over 12355.00 frames. ], tot_loss[loss=0.1608, simple_loss=0.2515, pruned_loss=0.03509, over 2390477.61 frames. ], batch size: 35, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:32:57,260 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.737e+02 2.486e+02 2.911e+02 3.434e+02 5.921e+02, threshold=5.822e+02, percent-clipped=1.0 2023-05-19 01:33:06,566 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.47 vs. limit=2.0 2023-05-19 01:33:10,496 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.3122, 3.1561, 3.1988, 3.3830, 2.4437, 3.2007, 2.6378, 2.9439], device='cuda:0'), covar=tensor([0.1513, 0.0807, 0.0792, 0.0614, 0.1091, 0.0787, 0.1500, 0.0724], device='cuda:0'), in_proj_covar=tensor([0.0236, 0.0276, 0.0302, 0.0365, 0.0249, 0.0249, 0.0269, 0.0375], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-19 01:33:10,955 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=333926.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:33:23,195 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.1350, 4.0474, 4.1123, 4.3531, 2.9870, 4.0609, 2.6538, 4.1196], device='cuda:0'), covar=tensor([0.1917, 0.0805, 0.0901, 0.0666, 0.1280, 0.0652, 0.2048, 0.1153], device='cuda:0'), in_proj_covar=tensor([0.0237, 0.0276, 0.0303, 0.0366, 0.0250, 0.0250, 0.0270, 0.0376], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-19 01:33:27,094 INFO [finetune.py:992] (0/2) Epoch 20, batch 1500, loss[loss=0.131, simple_loss=0.2138, pruned_loss=0.02411, over 12280.00 frames. ], tot_loss[loss=0.1602, simple_loss=0.2507, pruned_loss=0.03482, over 2388651.40 frames. ], batch size: 28, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:33:48,677 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.8783, 2.9626, 4.7312, 4.8326, 2.9069, 2.7228, 3.1402, 2.4072], device='cuda:0'), covar=tensor([0.1677, 0.2940, 0.0427, 0.0452, 0.1446, 0.2647, 0.2801, 0.4107], device='cuda:0'), in_proj_covar=tensor([0.0316, 0.0400, 0.0284, 0.0310, 0.0285, 0.0331, 0.0411, 0.0387], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-19 01:34:02,313 INFO [finetune.py:992] (0/2) Epoch 20, batch 1550, loss[loss=0.1643, simple_loss=0.2604, pruned_loss=0.03413, over 12157.00 frames. ], tot_loss[loss=0.161, simple_loss=0.2515, pruned_loss=0.0352, over 2386789.98 frames. ], batch size: 34, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:34:04,046 INFO [checkpoint.py:75] (0/2) Saving checkpoint to pruned_transducer_stateless7/exp_giga_finetune/checkpoint-234000.pt 2023-05-19 01:34:10,932 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.806e+02 2.627e+02 3.050e+02 3.629e+02 6.621e+02, threshold=6.100e+02, percent-clipped=1.0 2023-05-19 01:34:18,012 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=334016.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 01:34:40,150 INFO [finetune.py:992] (0/2) Epoch 20, batch 1600, loss[loss=0.1274, simple_loss=0.213, pruned_loss=0.02086, over 12266.00 frames. ], tot_loss[loss=0.1613, simple_loss=0.2518, pruned_loss=0.03535, over 2389064.74 frames. ], batch size: 28, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:34:41,055 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.8057, 2.9814, 3.3702, 4.5207, 2.6622, 4.5582, 4.7210, 4.7320], device='cuda:0'), covar=tensor([0.0129, 0.1208, 0.0553, 0.0181, 0.1432, 0.0267, 0.0156, 0.0123], device='cuda:0'), in_proj_covar=tensor([0.0126, 0.0206, 0.0186, 0.0125, 0.0192, 0.0184, 0.0183, 0.0130], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 01:34:45,196 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.6292, 4.5075, 4.5882, 4.6318, 4.3655, 4.4059, 4.2216, 4.5000], device='cuda:0'), covar=tensor([0.0848, 0.0668, 0.1098, 0.0667, 0.1770, 0.1304, 0.0619, 0.1210], device='cuda:0'), in_proj_covar=tensor([0.0571, 0.0747, 0.0652, 0.0665, 0.0890, 0.0781, 0.0596, 0.0511], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0005, 0.0003, 0.0003], device='cuda:0') 2023-05-19 01:34:50,047 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=334062.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:34:50,912 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.22 vs. limit=2.0 2023-05-19 01:35:00,449 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=334077.0, num_to_drop=1, layers_to_drop={3} 2023-05-19 01:35:15,172 INFO [finetune.py:992] (0/2) Epoch 20, batch 1650, loss[loss=0.1703, simple_loss=0.268, pruned_loss=0.03634, over 12363.00 frames. ], tot_loss[loss=0.1613, simple_loss=0.2521, pruned_loss=0.03531, over 2388008.45 frames. ], batch size: 38, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:35:20,553 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.744e+02 2.662e+02 2.886e+02 3.519e+02 5.131e+02, threshold=5.773e+02, percent-clipped=0.0 2023-05-19 01:35:23,370 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=334110.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:35:41,281 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.3984, 5.1948, 5.3018, 5.3477, 5.0260, 5.0423, 4.7583, 5.3081], device='cuda:0'), covar=tensor([0.0710, 0.0652, 0.0932, 0.0630, 0.1930, 0.1412, 0.0624, 0.1014], device='cuda:0'), in_proj_covar=tensor([0.0569, 0.0743, 0.0649, 0.0661, 0.0886, 0.0779, 0.0593, 0.0509], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0005, 0.0003, 0.0003], device='cuda:0') 2023-05-19 01:35:50,189 INFO [finetune.py:992] (0/2) Epoch 20, batch 1700, loss[loss=0.1938, simple_loss=0.2939, pruned_loss=0.04688, over 12130.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2529, pruned_loss=0.03563, over 2375782.39 frames. ], batch size: 38, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:36:16,173 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([6.2552, 6.1407, 5.9863, 5.4363, 5.2983, 6.1526, 5.8201, 5.5227], device='cuda:0'), covar=tensor([0.0764, 0.1153, 0.0729, 0.1832, 0.0653, 0.0707, 0.1399, 0.1044], device='cuda:0'), in_proj_covar=tensor([0.0664, 0.0595, 0.0550, 0.0676, 0.0446, 0.0773, 0.0812, 0.0594], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0004, 0.0004, 0.0002], device='cuda:0') 2023-05-19 01:36:24,989 INFO [finetune.py:992] (0/2) Epoch 20, batch 1750, loss[loss=0.1712, simple_loss=0.2695, pruned_loss=0.03645, over 12116.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.253, pruned_loss=0.03576, over 2366379.52 frames. ], batch size: 38, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:36:30,883 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.634e+02 3.119e+02 3.600e+02 7.041e+02, threshold=6.237e+02, percent-clipped=1.0 2023-05-19 01:36:36,159 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.20 vs. limit=2.0 2023-05-19 01:36:59,529 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.1602, 4.8787, 4.9350, 5.0609, 4.8527, 5.1290, 4.9334, 2.8383], device='cuda:0'), covar=tensor([0.0088, 0.0070, 0.0096, 0.0062, 0.0064, 0.0102, 0.0109, 0.0769], device='cuda:0'), in_proj_covar=tensor([0.0073, 0.0085, 0.0088, 0.0077, 0.0064, 0.0099, 0.0087, 0.0103], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 01:37:00,808 INFO [finetune.py:992] (0/2) Epoch 20, batch 1800, loss[loss=0.1841, simple_loss=0.2765, pruned_loss=0.04586, over 12099.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.2526, pruned_loss=0.03553, over 2374824.84 frames. ], batch size: 38, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:37:36,345 INFO [finetune.py:992] (0/2) Epoch 20, batch 1850, loss[loss=0.1416, simple_loss=0.2378, pruned_loss=0.02267, over 12035.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.253, pruned_loss=0.03588, over 2364538.25 frames. ], batch size: 31, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:37:42,080 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.589e+02 2.626e+02 2.937e+02 3.598e+02 5.513e+02, threshold=5.873e+02, percent-clipped=0.0 2023-05-19 01:37:45,219 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.7559, 3.3063, 5.1949, 2.6633, 2.9357, 3.7739, 3.1471, 3.7968], device='cuda:0'), covar=tensor([0.0406, 0.1191, 0.0293, 0.1215, 0.2007, 0.1780, 0.1525, 0.1242], device='cuda:0'), in_proj_covar=tensor([0.0243, 0.0243, 0.0268, 0.0192, 0.0243, 0.0298, 0.0232, 0.0275], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 01:38:04,596 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.2621, 4.9632, 5.0858, 5.1371, 4.9164, 5.1365, 5.0440, 2.6886], device='cuda:0'), covar=tensor([0.0086, 0.0061, 0.0074, 0.0055, 0.0053, 0.0110, 0.0069, 0.0795], device='cuda:0'), in_proj_covar=tensor([0.0074, 0.0086, 0.0090, 0.0079, 0.0065, 0.0100, 0.0088, 0.0104], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 01:38:11,285 INFO [finetune.py:992] (0/2) Epoch 20, batch 1900, loss[loss=0.1438, simple_loss=0.2317, pruned_loss=0.02791, over 12349.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2535, pruned_loss=0.03621, over 2362720.60 frames. ], batch size: 30, lr: 3.10e-03, grad_scale: 16.0 2023-05-19 01:38:28,191 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=334372.0, num_to_drop=1, layers_to_drop={2} 2023-05-19 01:38:46,664 INFO [finetune.py:992] (0/2) Epoch 20, batch 1950, loss[loss=0.1657, simple_loss=0.257, pruned_loss=0.03725, over 12115.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2526, pruned_loss=0.03586, over 2372082.67 frames. ], batch size: 39, lr: 3.10e-03, grad_scale: 16.0 2023-05-19 01:38:53,056 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.026e+02 2.618e+02 2.924e+02 3.581e+02 8.497e+02, threshold=5.848e+02, percent-clipped=1.0 2023-05-19 01:39:22,123 INFO [finetune.py:992] (0/2) Epoch 20, batch 2000, loss[loss=0.1596, simple_loss=0.2493, pruned_loss=0.03491, over 12259.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2529, pruned_loss=0.0359, over 2372303.57 frames. ], batch size: 37, lr: 3.10e-03, grad_scale: 16.0 2023-05-19 01:39:27,712 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0973, 4.7325, 4.7815, 4.9487, 4.7514, 4.9721, 4.8508, 2.6472], device='cuda:0'), covar=tensor([0.0131, 0.0081, 0.0124, 0.0083, 0.0069, 0.0115, 0.0124, 0.0981], device='cuda:0'), in_proj_covar=tensor([0.0074, 0.0086, 0.0090, 0.0079, 0.0065, 0.0100, 0.0088, 0.0105], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 01:39:32,049 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.7880, 2.3084, 3.2621, 2.7716, 3.0791, 2.9975, 2.3338, 3.2437], device='cuda:0'), covar=tensor([0.0179, 0.0444, 0.0196, 0.0276, 0.0192, 0.0226, 0.0440, 0.0154], device='cuda:0'), in_proj_covar=tensor([0.0190, 0.0211, 0.0199, 0.0194, 0.0226, 0.0174, 0.0205, 0.0200], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 01:39:35,694 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.90 vs. limit=5.0 2023-05-19 01:39:49,504 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.9872, 4.8806, 4.8394, 4.8431, 4.5902, 4.9753, 5.0051, 5.2115], device='cuda:0'), covar=tensor([0.0268, 0.0179, 0.0220, 0.0457, 0.0788, 0.0363, 0.0167, 0.0183], device='cuda:0'), in_proj_covar=tensor([0.0206, 0.0207, 0.0201, 0.0258, 0.0248, 0.0230, 0.0187, 0.0243], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-19 01:39:56,207 INFO [finetune.py:992] (0/2) Epoch 20, batch 2050, loss[loss=0.1597, simple_loss=0.2507, pruned_loss=0.03436, over 10510.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2527, pruned_loss=0.03588, over 2374492.83 frames. ], batch size: 68, lr: 3.10e-03, grad_scale: 16.0 2023-05-19 01:39:56,360 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.8910, 4.5255, 4.8773, 4.3429, 4.5622, 4.3856, 4.9080, 4.5488], device='cuda:0'), covar=tensor([0.0292, 0.0421, 0.0305, 0.0270, 0.0428, 0.0377, 0.0215, 0.0429], device='cuda:0'), in_proj_covar=tensor([0.0282, 0.0285, 0.0307, 0.0277, 0.0279, 0.0277, 0.0252, 0.0226], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 01:40:01,612 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.895e+02 2.676e+02 3.222e+02 3.975e+02 1.107e+03, threshold=6.444e+02, percent-clipped=4.0 2023-05-19 01:40:03,222 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([6.1244, 6.0389, 5.8265, 5.3087, 5.1817, 5.9805, 5.6294, 5.2747], device='cuda:0'), covar=tensor([0.0778, 0.1087, 0.0684, 0.1908, 0.0726, 0.0779, 0.1555, 0.1056], device='cuda:0'), in_proj_covar=tensor([0.0662, 0.0592, 0.0548, 0.0674, 0.0445, 0.0769, 0.0811, 0.0592], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0004, 0.0002], device='cuda:0') 2023-05-19 01:40:04,619 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([6.0093, 5.9575, 5.7201, 5.2763, 5.1794, 5.9136, 5.5320, 5.2394], device='cuda:0'), covar=tensor([0.0775, 0.1003, 0.0718, 0.1852, 0.0820, 0.0712, 0.1549, 0.1067], device='cuda:0'), in_proj_covar=tensor([0.0662, 0.0592, 0.0548, 0.0673, 0.0445, 0.0768, 0.0810, 0.0592], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0004, 0.0002], device='cuda:0') 2023-05-19 01:40:31,435 INFO [finetune.py:992] (0/2) Epoch 20, batch 2100, loss[loss=0.1592, simple_loss=0.252, pruned_loss=0.03317, over 11656.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.2523, pruned_loss=0.03578, over 2380267.04 frames. ], batch size: 48, lr: 3.10e-03, grad_scale: 16.0 2023-05-19 01:41:06,361 INFO [finetune.py:992] (0/2) Epoch 20, batch 2150, loss[loss=0.1454, simple_loss=0.2338, pruned_loss=0.02852, over 12079.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2524, pruned_loss=0.03599, over 2373046.09 frames. ], batch size: 32, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:41:12,933 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.789e+02 2.666e+02 3.122e+02 3.748e+02 7.346e+02, threshold=6.243e+02, percent-clipped=1.0 2023-05-19 01:41:14,539 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=334609.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 01:41:41,325 INFO [finetune.py:992] (0/2) Epoch 20, batch 2200, loss[loss=0.1663, simple_loss=0.2553, pruned_loss=0.03861, over 12057.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2528, pruned_loss=0.03616, over 2370703.51 frames. ], batch size: 42, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:41:42,568 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-05-19 01:41:57,603 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=334670.0, num_to_drop=1, layers_to_drop={3} 2023-05-19 01:41:58,880 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=334672.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 01:42:02,362 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.0570, 2.5764, 3.0957, 3.8693, 2.1966, 3.9730, 4.0066, 4.1258], device='cuda:0'), covar=tensor([0.0139, 0.1225, 0.0491, 0.0207, 0.1433, 0.0246, 0.0195, 0.0111], device='cuda:0'), in_proj_covar=tensor([0.0127, 0.0208, 0.0188, 0.0127, 0.0194, 0.0186, 0.0185, 0.0130], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 01:42:03,059 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.9819, 2.2598, 3.3808, 3.9205, 3.5387, 4.0069, 3.6048, 2.8437], device='cuda:0'), covar=tensor([0.0064, 0.0494, 0.0183, 0.0071, 0.0161, 0.0091, 0.0177, 0.0420], device='cuda:0'), in_proj_covar=tensor([0.0093, 0.0126, 0.0107, 0.0084, 0.0107, 0.0120, 0.0106, 0.0140], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 01:42:17,548 INFO [finetune.py:992] (0/2) Epoch 20, batch 2250, loss[loss=0.178, simple_loss=0.2691, pruned_loss=0.04347, over 12107.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2526, pruned_loss=0.03613, over 2367480.64 frames. ], batch size: 33, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:42:23,678 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.619e+02 2.718e+02 3.096e+02 3.651e+02 6.621e+02, threshold=6.192e+02, percent-clipped=1.0 2023-05-19 01:42:32,788 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=334720.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 01:42:41,123 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.5293, 3.6014, 3.2613, 3.0415, 2.7740, 2.6622, 3.5881, 2.2648], device='cuda:0'), covar=tensor([0.0438, 0.0149, 0.0222, 0.0275, 0.0460, 0.0516, 0.0178, 0.0589], device='cuda:0'), in_proj_covar=tensor([0.0201, 0.0168, 0.0175, 0.0198, 0.0208, 0.0206, 0.0181, 0.0213], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 01:42:50,626 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.5690, 5.3755, 5.5201, 5.5662, 5.2276, 5.2343, 4.9347, 5.4317], device='cuda:0'), covar=tensor([0.0692, 0.0530, 0.0772, 0.0507, 0.1727, 0.1286, 0.0567, 0.1087], device='cuda:0'), in_proj_covar=tensor([0.0568, 0.0740, 0.0645, 0.0653, 0.0883, 0.0774, 0.0590, 0.0503], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0003, 0.0003], device='cuda:0') 2023-05-19 01:42:51,925 INFO [finetune.py:992] (0/2) Epoch 20, batch 2300, loss[loss=0.1623, simple_loss=0.2538, pruned_loss=0.03535, over 12033.00 frames. ], tot_loss[loss=0.162, simple_loss=0.2521, pruned_loss=0.03591, over 2368136.35 frames. ], batch size: 31, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:43:26,037 INFO [finetune.py:992] (0/2) Epoch 20, batch 2350, loss[loss=0.216, simple_loss=0.3059, pruned_loss=0.06307, over 12121.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.252, pruned_loss=0.03566, over 2375202.06 frames. ], batch size: 38, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:43:30,737 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.7162, 3.7951, 3.3607, 3.1814, 2.9790, 2.8909, 3.7756, 2.4718], device='cuda:0'), covar=tensor([0.0463, 0.0153, 0.0250, 0.0287, 0.0485, 0.0486, 0.0160, 0.0546], device='cuda:0'), in_proj_covar=tensor([0.0202, 0.0169, 0.0176, 0.0199, 0.0209, 0.0207, 0.0182, 0.0214], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 01:43:32,459 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.73 vs. limit=5.0 2023-05-19 01:43:32,586 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.684e+02 2.563e+02 3.097e+02 3.636e+02 6.081e+02, threshold=6.194e+02, percent-clipped=0.0 2023-05-19 01:43:54,157 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.72 vs. limit=2.0 2023-05-19 01:44:02,070 INFO [finetune.py:992] (0/2) Epoch 20, batch 2400, loss[loss=0.1652, simple_loss=0.265, pruned_loss=0.03275, over 12255.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.2518, pruned_loss=0.03534, over 2372644.32 frames. ], batch size: 37, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:44:12,859 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.7315, 2.3393, 3.1385, 2.7274, 3.0378, 2.9424, 2.3616, 3.1525], device='cuda:0'), covar=tensor([0.0185, 0.0404, 0.0197, 0.0272, 0.0195, 0.0216, 0.0402, 0.0177], device='cuda:0'), in_proj_covar=tensor([0.0191, 0.0213, 0.0201, 0.0196, 0.0228, 0.0176, 0.0207, 0.0202], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 01:44:23,357 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.38 vs. limit=2.0 2023-05-19 01:44:36,689 INFO [finetune.py:992] (0/2) Epoch 20, batch 2450, loss[loss=0.1556, simple_loss=0.2411, pruned_loss=0.03509, over 12345.00 frames. ], tot_loss[loss=0.1609, simple_loss=0.2513, pruned_loss=0.03531, over 2377265.16 frames. ], batch size: 31, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:44:42,908 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.723e+02 2.661e+02 3.134e+02 3.717e+02 6.499e+02, threshold=6.267e+02, percent-clipped=1.0 2023-05-19 01:44:44,088 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.95 vs. limit=2.0 2023-05-19 01:45:08,306 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=334943.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:45:11,540 INFO [finetune.py:992] (0/2) Epoch 20, batch 2500, loss[loss=0.1756, simple_loss=0.2608, pruned_loss=0.04518, over 11879.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.2517, pruned_loss=0.03532, over 2366734.43 frames. ], batch size: 44, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:45:13,836 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=334951.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:45:24,200 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=334965.0, num_to_drop=1, layers_to_drop={3} 2023-05-19 01:45:47,870 INFO [finetune.py:992] (0/2) Epoch 20, batch 2550, loss[loss=0.1865, simple_loss=0.2749, pruned_loss=0.04909, over 8350.00 frames. ], tot_loss[loss=0.1602, simple_loss=0.2509, pruned_loss=0.03478, over 2366516.32 frames. ], batch size: 97, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:45:52,545 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=335004.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:45:54,437 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.914e+02 2.500e+02 2.828e+02 3.369e+02 5.614e+02, threshold=5.656e+02, percent-clipped=0.0 2023-05-19 01:45:56,338 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.37 vs. limit=2.0 2023-05-19 01:45:58,165 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=335012.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:45:59,558 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=335014.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:46:23,149 INFO [finetune.py:992] (0/2) Epoch 20, batch 2600, loss[loss=0.1285, simple_loss=0.2109, pruned_loss=0.02308, over 12031.00 frames. ], tot_loss[loss=0.1601, simple_loss=0.2506, pruned_loss=0.03476, over 2368759.32 frames. ], batch size: 28, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:46:41,938 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=335075.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:46:58,513 INFO [finetune.py:992] (0/2) Epoch 20, batch 2650, loss[loss=0.1527, simple_loss=0.2548, pruned_loss=0.02527, over 12020.00 frames. ], tot_loss[loss=0.1609, simple_loss=0.2517, pruned_loss=0.03503, over 2366827.99 frames. ], batch size: 42, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:47:04,866 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.548e+02 2.565e+02 2.983e+02 3.518e+02 7.202e+02, threshold=5.967e+02, percent-clipped=2.0 2023-05-19 01:47:07,128 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.18 vs. limit=2.0 2023-05-19 01:47:14,539 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0921, 5.9498, 5.5349, 5.4309, 6.0633, 5.3915, 5.4710, 5.4042], device='cuda:0'), covar=tensor([0.1677, 0.0989, 0.1069, 0.1894, 0.0834, 0.2010, 0.1973, 0.1167], device='cuda:0'), in_proj_covar=tensor([0.0373, 0.0523, 0.0426, 0.0468, 0.0476, 0.0457, 0.0417, 0.0402], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 01:47:33,756 INFO [finetune.py:992] (0/2) Epoch 20, batch 2700, loss[loss=0.1738, simple_loss=0.275, pruned_loss=0.03628, over 12164.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.2519, pruned_loss=0.03518, over 2376052.46 frames. ], batch size: 34, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:47:38,260 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.16 vs. limit=2.0 2023-05-19 01:48:02,351 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=335189.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:48:08,468 INFO [finetune.py:992] (0/2) Epoch 20, batch 2750, loss[loss=0.1654, simple_loss=0.26, pruned_loss=0.03533, over 12034.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.2519, pruned_loss=0.03511, over 2377404.71 frames. ], batch size: 40, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:48:11,733 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.2193, 2.6462, 3.8136, 3.1840, 3.6104, 3.3316, 2.8363, 3.7450], device='cuda:0'), covar=tensor([0.0140, 0.0381, 0.0190, 0.0265, 0.0149, 0.0221, 0.0363, 0.0118], device='cuda:0'), in_proj_covar=tensor([0.0192, 0.0214, 0.0202, 0.0197, 0.0229, 0.0178, 0.0208, 0.0203], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 01:48:14,915 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.771e+02 2.685e+02 3.034e+02 3.622e+02 7.994e+02, threshold=6.067e+02, percent-clipped=2.0 2023-05-19 01:48:17,971 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.4826, 2.3931, 3.1763, 4.2991, 2.1520, 4.3121, 4.4410, 4.4390], device='cuda:0'), covar=tensor([0.0179, 0.1550, 0.0611, 0.0206, 0.1636, 0.0316, 0.0231, 0.0133], device='cuda:0'), in_proj_covar=tensor([0.0129, 0.0211, 0.0191, 0.0129, 0.0196, 0.0188, 0.0189, 0.0132], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0002], device='cuda:0') 2023-05-19 01:48:44,015 INFO [finetune.py:992] (0/2) Epoch 20, batch 2800, loss[loss=0.161, simple_loss=0.2521, pruned_loss=0.03493, over 12126.00 frames. ], tot_loss[loss=0.1615, simple_loss=0.252, pruned_loss=0.0355, over 2377412.58 frames. ], batch size: 39, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:48:45,563 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=335250.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:48:48,325 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.5904, 5.1360, 5.5937, 4.9294, 5.2751, 4.9549, 5.6644, 5.2408], device='cuda:0'), covar=tensor([0.0248, 0.0428, 0.0285, 0.0234, 0.0382, 0.0355, 0.0206, 0.0228], device='cuda:0'), in_proj_covar=tensor([0.0287, 0.0289, 0.0313, 0.0281, 0.0283, 0.0282, 0.0258, 0.0232], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 01:48:56,032 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=335265.0, num_to_drop=1, layers_to_drop={2} 2023-05-19 01:49:15,686 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.83 vs. limit=2.0 2023-05-19 01:49:19,243 INFO [finetune.py:992] (0/2) Epoch 20, batch 2850, loss[loss=0.179, simple_loss=0.2694, pruned_loss=0.04432, over 12034.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.2521, pruned_loss=0.03553, over 2371865.54 frames. ], batch size: 40, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:49:19,967 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=335299.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:49:25,409 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.880e+02 2.469e+02 2.827e+02 3.297e+02 5.034e+02, threshold=5.653e+02, percent-clipped=0.0 2023-05-19 01:49:25,498 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=335307.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:49:29,829 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=335313.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 01:49:54,085 INFO [finetune.py:992] (0/2) Epoch 20, batch 2900, loss[loss=0.1596, simple_loss=0.2541, pruned_loss=0.03253, over 12295.00 frames. ], tot_loss[loss=0.1606, simple_loss=0.2508, pruned_loss=0.03522, over 2377159.87 frames. ], batch size: 34, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:49:54,208 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.9574, 5.9386, 5.6850, 5.1934, 5.0781, 5.8338, 5.5206, 5.1755], device='cuda:0'), covar=tensor([0.0686, 0.0878, 0.0685, 0.1809, 0.0912, 0.0867, 0.1575, 0.1091], device='cuda:0'), in_proj_covar=tensor([0.0669, 0.0599, 0.0554, 0.0682, 0.0452, 0.0777, 0.0821, 0.0599], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0004, 0.0004, 0.0003], device='cuda:0') 2023-05-19 01:50:09,335 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=335370.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:50:29,204 INFO [finetune.py:992] (0/2) Epoch 20, batch 2950, loss[loss=0.1477, simple_loss=0.2402, pruned_loss=0.02754, over 12024.00 frames. ], tot_loss[loss=0.1607, simple_loss=0.2506, pruned_loss=0.03544, over 2374485.21 frames. ], batch size: 31, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:50:35,693 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.710e+02 2.492e+02 3.064e+02 3.622e+02 5.950e+02, threshold=6.129e+02, percent-clipped=2.0 2023-05-19 01:50:39,306 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.7540, 3.0621, 4.7314, 4.8985, 2.8877, 2.6524, 3.0071, 2.3943], device='cuda:0'), covar=tensor([0.1813, 0.3055, 0.0491, 0.0453, 0.1484, 0.2762, 0.2958, 0.4038], device='cuda:0'), in_proj_covar=tensor([0.0318, 0.0403, 0.0289, 0.0316, 0.0289, 0.0334, 0.0417, 0.0391], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-19 01:51:02,299 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=3.22 vs. limit=5.0 2023-05-19 01:51:04,654 INFO [finetune.py:992] (0/2) Epoch 20, batch 3000, loss[loss=0.1658, simple_loss=0.2451, pruned_loss=0.04329, over 12249.00 frames. ], tot_loss[loss=0.1606, simple_loss=0.2504, pruned_loss=0.03544, over 2366058.73 frames. ], batch size: 32, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:51:04,654 INFO [finetune.py:1017] (0/2) Computing validation loss 2023-05-19 01:51:17,035 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.6788, 5.6213, 5.5426, 4.8227, 5.0283, 5.5193, 5.1735, 5.0682], device='cuda:0'), covar=tensor([0.0614, 0.0878, 0.0541, 0.2072, 0.0528, 0.0692, 0.1384, 0.0869], device='cuda:0'), in_proj_covar=tensor([0.0667, 0.0595, 0.0551, 0.0679, 0.0450, 0.0772, 0.0817, 0.0596], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0004, 0.0004, 0.0002], device='cuda:0') 2023-05-19 01:51:18,804 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.3828, 3.3054, 3.0599, 2.8916, 2.6723, 2.5847, 3.1675, 2.0848], device='cuda:0'), covar=tensor([0.0469, 0.0154, 0.0200, 0.0229, 0.0402, 0.0364, 0.0168, 0.0637], device='cuda:0'), in_proj_covar=tensor([0.0203, 0.0170, 0.0177, 0.0200, 0.0210, 0.0208, 0.0183, 0.0215], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 01:51:22,068 INFO [finetune.py:1026] (0/2) Epoch 20, validation: loss=0.3175, simple_loss=0.3915, pruned_loss=0.1217, over 1020973.00 frames. 2023-05-19 01:51:22,069 INFO [finetune.py:1027] (0/2) Maximum memory allocated so far is 12525MB 2023-05-19 01:51:40,142 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.1708, 2.5027, 3.0696, 4.0033, 2.2611, 4.0785, 4.1122, 4.1772], device='cuda:0'), covar=tensor([0.0218, 0.1266, 0.0586, 0.0238, 0.1467, 0.0317, 0.0230, 0.0150], device='cuda:0'), in_proj_covar=tensor([0.0129, 0.0211, 0.0190, 0.0128, 0.0195, 0.0188, 0.0188, 0.0131], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0002], device='cuda:0') 2023-05-19 01:52:01,139 INFO [finetune.py:992] (0/2) Epoch 20, batch 3050, loss[loss=0.1767, simple_loss=0.2697, pruned_loss=0.0418, over 12048.00 frames. ], tot_loss[loss=0.1606, simple_loss=0.2504, pruned_loss=0.03544, over 2370079.05 frames. ], batch size: 40, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:52:07,453 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.677e+02 2.598e+02 2.989e+02 3.407e+02 6.082e+02, threshold=5.978e+02, percent-clipped=0.0 2023-05-19 01:52:25,218 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.9360, 2.3900, 3.5508, 2.9453, 3.3807, 3.1532, 2.4899, 3.4415], device='cuda:0'), covar=tensor([0.0189, 0.0444, 0.0202, 0.0298, 0.0178, 0.0238, 0.0426, 0.0181], device='cuda:0'), in_proj_covar=tensor([0.0195, 0.0215, 0.0204, 0.0199, 0.0232, 0.0179, 0.0210, 0.0205], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 01:52:34,979 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=335545.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:52:35,083 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.1077, 2.2770, 2.8512, 2.8913, 2.9673, 3.1517, 2.8705, 2.4304], device='cuda:0'), covar=tensor([0.0084, 0.0368, 0.0191, 0.0086, 0.0149, 0.0105, 0.0163, 0.0360], device='cuda:0'), in_proj_covar=tensor([0.0092, 0.0125, 0.0106, 0.0083, 0.0106, 0.0119, 0.0106, 0.0140], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 01:52:36,918 INFO [finetune.py:992] (0/2) Epoch 20, batch 3100, loss[loss=0.1551, simple_loss=0.246, pruned_loss=0.03208, over 11684.00 frames. ], tot_loss[loss=0.16, simple_loss=0.2497, pruned_loss=0.03513, over 2366388.18 frames. ], batch size: 48, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:53:11,906 INFO [finetune.py:992] (0/2) Epoch 20, batch 3150, loss[loss=0.1453, simple_loss=0.2301, pruned_loss=0.03021, over 12015.00 frames. ], tot_loss[loss=0.1594, simple_loss=0.2493, pruned_loss=0.03472, over 2378020.29 frames. ], batch size: 28, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:53:12,630 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=335599.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:53:18,350 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.672e+02 3.275e+02 4.152e+02 4.460e+03, threshold=6.549e+02, percent-clipped=10.0 2023-05-19 01:53:18,554 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=335607.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:53:46,625 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.39 vs. limit=5.0 2023-05-19 01:53:46,912 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=335647.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:53:47,590 INFO [finetune.py:992] (0/2) Epoch 20, batch 3200, loss[loss=0.1292, simple_loss=0.2096, pruned_loss=0.02436, over 12001.00 frames. ], tot_loss[loss=0.1588, simple_loss=0.2488, pruned_loss=0.03438, over 2376551.38 frames. ], batch size: 28, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:53:52,582 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=335655.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:53:56,620 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.3812, 6.1752, 5.7461, 5.6728, 6.2513, 5.5301, 5.6341, 5.7073], device='cuda:0'), covar=tensor([0.1578, 0.0931, 0.1071, 0.2074, 0.0929, 0.2238, 0.2199, 0.1116], device='cuda:0'), in_proj_covar=tensor([0.0374, 0.0526, 0.0425, 0.0471, 0.0479, 0.0461, 0.0418, 0.0403], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 01:54:02,847 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=335670.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:54:23,196 INFO [finetune.py:992] (0/2) Epoch 20, batch 3250, loss[loss=0.1619, simple_loss=0.2555, pruned_loss=0.03416, over 12163.00 frames. ], tot_loss[loss=0.1589, simple_loss=0.2488, pruned_loss=0.03453, over 2370508.62 frames. ], batch size: 36, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:54:29,529 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.418e+02 2.914e+02 3.419e+02 5.919e+02, threshold=5.828e+02, percent-clipped=0.0 2023-05-19 01:54:37,107 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=335718.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:54:45,788 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.07 vs. limit=5.0 2023-05-19 01:54:53,467 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.56 vs. limit=2.0 2023-05-19 01:54:57,870 INFO [finetune.py:992] (0/2) Epoch 20, batch 3300, loss[loss=0.1677, simple_loss=0.2576, pruned_loss=0.03889, over 12058.00 frames. ], tot_loss[loss=0.1588, simple_loss=0.2489, pruned_loss=0.03435, over 2371921.90 frames. ], batch size: 37, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:55:33,089 INFO [finetune.py:992] (0/2) Epoch 20, batch 3350, loss[loss=0.171, simple_loss=0.2577, pruned_loss=0.04212, over 12118.00 frames. ], tot_loss[loss=0.1595, simple_loss=0.2497, pruned_loss=0.03468, over 2373333.34 frames. ], batch size: 38, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:55:39,839 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.725e+02 3.133e+02 3.598e+02 6.227e+02, threshold=6.266e+02, percent-clipped=1.0 2023-05-19 01:56:00,091 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=335835.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:56:03,064 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.6903, 3.0173, 4.6494, 4.7450, 2.9854, 2.6276, 3.1688, 2.2175], device='cuda:0'), covar=tensor([0.1822, 0.3116, 0.0442, 0.0446, 0.1339, 0.2766, 0.2721, 0.4198], device='cuda:0'), in_proj_covar=tensor([0.0318, 0.0401, 0.0288, 0.0316, 0.0288, 0.0333, 0.0414, 0.0388], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-19 01:56:06,926 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=335845.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:56:08,901 INFO [finetune.py:992] (0/2) Epoch 20, batch 3400, loss[loss=0.1468, simple_loss=0.2387, pruned_loss=0.02746, over 12293.00 frames. ], tot_loss[loss=0.1595, simple_loss=0.2495, pruned_loss=0.03469, over 2368667.74 frames. ], batch size: 33, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:56:40,354 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=335893.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:56:42,565 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=335896.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:56:43,794 INFO [finetune.py:992] (0/2) Epoch 20, batch 3450, loss[loss=0.1598, simple_loss=0.2523, pruned_loss=0.03367, over 11829.00 frames. ], tot_loss[loss=0.1601, simple_loss=0.2502, pruned_loss=0.03495, over 2375719.91 frames. ], batch size: 44, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:56:50,221 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.551e+02 2.981e+02 3.493e+02 8.017e+02, threshold=5.963e+02, percent-clipped=1.0 2023-05-19 01:56:51,903 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.20 vs. limit=2.0 2023-05-19 01:56:53,383 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.3220, 4.8556, 5.2996, 4.6443, 4.9759, 4.7710, 5.3457, 5.0358], device='cuda:0'), covar=tensor([0.0273, 0.0402, 0.0282, 0.0269, 0.0471, 0.0323, 0.0210, 0.0245], device='cuda:0'), in_proj_covar=tensor([0.0287, 0.0289, 0.0315, 0.0284, 0.0285, 0.0282, 0.0258, 0.0234], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 01:57:13,794 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.2527, 4.8146, 4.3688, 5.0550, 4.6466, 3.1129, 4.3001, 3.1236], device='cuda:0'), covar=tensor([0.0918, 0.0747, 0.1217, 0.0456, 0.1086, 0.1707, 0.1110, 0.3492], device='cuda:0'), in_proj_covar=tensor([0.0323, 0.0393, 0.0376, 0.0355, 0.0386, 0.0288, 0.0362, 0.0381], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 01:57:19,790 INFO [finetune.py:992] (0/2) Epoch 20, batch 3500, loss[loss=0.1771, simple_loss=0.2609, pruned_loss=0.04668, over 8567.00 frames. ], tot_loss[loss=0.1601, simple_loss=0.2504, pruned_loss=0.03496, over 2375882.40 frames. ], batch size: 98, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:57:20,049 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.7421, 2.7164, 3.9271, 4.0262, 2.8641, 2.6229, 2.8461, 2.2210], device='cuda:0'), covar=tensor([0.1762, 0.2983, 0.0630, 0.0598, 0.1376, 0.2666, 0.2867, 0.4192], device='cuda:0'), in_proj_covar=tensor([0.0315, 0.0399, 0.0286, 0.0315, 0.0286, 0.0331, 0.0411, 0.0386], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-19 01:57:45,998 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2023-05-19 01:57:55,106 INFO [finetune.py:992] (0/2) Epoch 20, batch 3550, loss[loss=0.1417, simple_loss=0.2315, pruned_loss=0.0259, over 12179.00 frames. ], tot_loss[loss=0.1604, simple_loss=0.2505, pruned_loss=0.03512, over 2382001.47 frames. ], batch size: 31, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:57:56,925 INFO [checkpoint.py:75] (0/2) Saving checkpoint to pruned_transducer_stateless7/exp_giga_finetune/checkpoint-236000.pt 2023-05-19 01:58:04,240 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.896e+02 2.683e+02 3.263e+02 3.799e+02 1.675e+03, threshold=6.525e+02, percent-clipped=3.0 2023-05-19 01:58:20,643 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.6142, 3.2755, 5.0332, 2.6513, 2.8458, 3.7697, 3.1303, 3.7869], device='cuda:0'), covar=tensor([0.0484, 0.1231, 0.0363, 0.1277, 0.2042, 0.1595, 0.1520, 0.1260], device='cuda:0'), in_proj_covar=tensor([0.0246, 0.0244, 0.0271, 0.0192, 0.0244, 0.0301, 0.0234, 0.0278], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 01:58:32,806 INFO [finetune.py:992] (0/2) Epoch 20, batch 3600, loss[loss=0.1625, simple_loss=0.2545, pruned_loss=0.03526, over 12194.00 frames. ], tot_loss[loss=0.1608, simple_loss=0.251, pruned_loss=0.03531, over 2377730.88 frames. ], batch size: 35, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:58:53,287 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0 2023-05-19 01:59:08,295 INFO [finetune.py:992] (0/2) Epoch 20, batch 3650, loss[loss=0.1402, simple_loss=0.2243, pruned_loss=0.02799, over 11998.00 frames. ], tot_loss[loss=0.1604, simple_loss=0.2503, pruned_loss=0.03527, over 2377908.56 frames. ], batch size: 28, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 01:59:14,610 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.912e+02 2.720e+02 3.014e+02 3.515e+02 1.415e+03, threshold=6.028e+02, percent-clipped=3.0 2023-05-19 01:59:43,827 INFO [finetune.py:992] (0/2) Epoch 20, batch 3700, loss[loss=0.1634, simple_loss=0.2605, pruned_loss=0.03317, over 11267.00 frames. ], tot_loss[loss=0.1606, simple_loss=0.2505, pruned_loss=0.03531, over 2375188.49 frames. ], batch size: 55, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 01:59:58,107 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.95 vs. limit=2.0 2023-05-19 02:00:08,162 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.69 vs. limit=2.0 2023-05-19 02:00:14,028 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=336191.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:00:18,884 INFO [finetune.py:992] (0/2) Epoch 20, batch 3750, loss[loss=0.1629, simple_loss=0.2566, pruned_loss=0.03461, over 12150.00 frames. ], tot_loss[loss=0.1599, simple_loss=0.2499, pruned_loss=0.03492, over 2372094.53 frames. ], batch size: 36, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 02:00:25,519 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.958e+02 2.588e+02 2.947e+02 3.372e+02 5.172e+02, threshold=5.894e+02, percent-clipped=0.0 2023-05-19 02:00:27,117 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.4137, 3.3747, 3.1097, 2.9413, 2.6208, 2.5316, 3.3151, 2.2014], device='cuda:0'), covar=tensor([0.0475, 0.0160, 0.0203, 0.0265, 0.0437, 0.0456, 0.0166, 0.0561], device='cuda:0'), in_proj_covar=tensor([0.0202, 0.0170, 0.0176, 0.0200, 0.0209, 0.0207, 0.0183, 0.0213], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 02:00:38,357 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2023-05-19 02:00:54,658 INFO [finetune.py:992] (0/2) Epoch 20, batch 3800, loss[loss=0.1633, simple_loss=0.2571, pruned_loss=0.03481, over 12070.00 frames. ], tot_loss[loss=0.1603, simple_loss=0.2503, pruned_loss=0.03519, over 2366001.60 frames. ], batch size: 40, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 02:01:29,903 INFO [finetune.py:992] (0/2) Epoch 20, batch 3850, loss[loss=0.1385, simple_loss=0.2282, pruned_loss=0.02442, over 12141.00 frames. ], tot_loss[loss=0.1604, simple_loss=0.2503, pruned_loss=0.03523, over 2368412.47 frames. ], batch size: 30, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 02:01:35,903 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.015e+02 2.561e+02 2.934e+02 3.404e+02 8.064e+02, threshold=5.868e+02, percent-clipped=2.0 2023-05-19 02:01:38,221 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0919, 6.0580, 5.6353, 5.5626, 6.1196, 5.4539, 5.5722, 5.5514], device='cuda:0'), covar=tensor([0.1680, 0.0892, 0.1136, 0.1714, 0.0802, 0.2142, 0.1679, 0.1082], device='cuda:0'), in_proj_covar=tensor([0.0374, 0.0520, 0.0422, 0.0466, 0.0477, 0.0459, 0.0416, 0.0403], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 02:02:04,519 INFO [finetune.py:992] (0/2) Epoch 20, batch 3900, loss[loss=0.1686, simple_loss=0.2589, pruned_loss=0.03915, over 12287.00 frames. ], tot_loss[loss=0.1613, simple_loss=0.2512, pruned_loss=0.03568, over 2359771.05 frames. ], batch size: 37, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 02:02:22,770 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.6468, 2.8989, 3.9205, 2.4786, 2.6316, 3.3358, 2.8700, 3.3593], device='cuda:0'), covar=tensor([0.0567, 0.1243, 0.0454, 0.1342, 0.1830, 0.1318, 0.1376, 0.1109], device='cuda:0'), in_proj_covar=tensor([0.0248, 0.0246, 0.0272, 0.0193, 0.0246, 0.0302, 0.0235, 0.0280], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 02:02:39,991 INFO [finetune.py:992] (0/2) Epoch 20, batch 3950, loss[loss=0.1717, simple_loss=0.2638, pruned_loss=0.03977, over 12246.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.2512, pruned_loss=0.0362, over 2347508.30 frames. ], batch size: 32, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 02:02:44,533 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=336404.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:02:46,573 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.980e+02 2.690e+02 3.132e+02 3.710e+02 7.382e+02, threshold=6.264e+02, percent-clipped=1.0 2023-05-19 02:03:14,986 INFO [finetune.py:992] (0/2) Epoch 20, batch 4000, loss[loss=0.1403, simple_loss=0.2297, pruned_loss=0.02543, over 12033.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2516, pruned_loss=0.03639, over 2349675.85 frames. ], batch size: 31, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 02:03:27,190 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=336465.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:03:28,587 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.8346, 2.3524, 3.3097, 2.8001, 3.1555, 3.0636, 2.2776, 3.2404], device='cuda:0'), covar=tensor([0.0206, 0.0452, 0.0163, 0.0341, 0.0196, 0.0235, 0.0457, 0.0175], device='cuda:0'), in_proj_covar=tensor([0.0196, 0.0214, 0.0204, 0.0200, 0.0232, 0.0178, 0.0210, 0.0206], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 02:03:34,672 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0838, 5.8939, 5.5342, 5.3673, 6.0145, 5.2718, 5.3875, 5.4208], device='cuda:0'), covar=tensor([0.1582, 0.0958, 0.1033, 0.2195, 0.0876, 0.2297, 0.1961, 0.1263], device='cuda:0'), in_proj_covar=tensor([0.0377, 0.0524, 0.0427, 0.0471, 0.0479, 0.0463, 0.0419, 0.0405], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 02:03:45,087 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=336491.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:03:49,728 INFO [finetune.py:992] (0/2) Epoch 20, batch 4050, loss[loss=0.1358, simple_loss=0.2238, pruned_loss=0.02389, over 12291.00 frames. ], tot_loss[loss=0.162, simple_loss=0.2518, pruned_loss=0.03614, over 2363528.35 frames. ], batch size: 33, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 02:03:56,645 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.699e+02 2.492e+02 2.845e+02 3.416e+02 7.889e+02, threshold=5.690e+02, percent-clipped=2.0 2023-05-19 02:04:19,311 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=336539.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:04:23,039 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.2611, 4.8513, 4.2337, 4.9814, 4.5687, 3.0165, 4.3631, 3.0580], device='cuda:0'), covar=tensor([0.0900, 0.0559, 0.1372, 0.0496, 0.1062, 0.1725, 0.1017, 0.3471], device='cuda:0'), in_proj_covar=tensor([0.0323, 0.0392, 0.0375, 0.0353, 0.0384, 0.0287, 0.0361, 0.0381], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 02:04:25,523 INFO [finetune.py:992] (0/2) Epoch 20, batch 4100, loss[loss=0.192, simple_loss=0.2766, pruned_loss=0.05374, over 12328.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2524, pruned_loss=0.03617, over 2360751.03 frames. ], batch size: 36, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 02:05:00,051 INFO [finetune.py:992] (0/2) Epoch 20, batch 4150, loss[loss=0.2165, simple_loss=0.3007, pruned_loss=0.06618, over 12153.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2521, pruned_loss=0.03632, over 2365472.09 frames. ], batch size: 39, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:05:06,643 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.661e+02 2.584e+02 3.123e+02 3.691e+02 5.062e+02, threshold=6.246e+02, percent-clipped=0.0 2023-05-19 02:05:35,524 INFO [finetune.py:992] (0/2) Epoch 20, batch 4200, loss[loss=0.1743, simple_loss=0.2692, pruned_loss=0.0397, over 12117.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.2515, pruned_loss=0.0359, over 2372380.51 frames. ], batch size: 33, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:05:49,005 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.15 vs. limit=2.0 2023-05-19 02:06:10,471 INFO [finetune.py:992] (0/2) Epoch 20, batch 4250, loss[loss=0.1765, simple_loss=0.2752, pruned_loss=0.03888, over 12156.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2521, pruned_loss=0.03624, over 2359857.33 frames. ], batch size: 36, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:06:13,054 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.39 vs. limit=2.0 2023-05-19 02:06:16,575 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.796e+02 2.659e+02 3.290e+02 3.911e+02 6.894e+02, threshold=6.580e+02, percent-clipped=3.0 2023-05-19 02:06:17,547 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=336708.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:06:32,541 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.0317, 2.4449, 3.5836, 2.9985, 3.3772, 3.1493, 2.5351, 3.4751], device='cuda:0'), covar=tensor([0.0170, 0.0427, 0.0165, 0.0288, 0.0183, 0.0235, 0.0438, 0.0170], device='cuda:0'), in_proj_covar=tensor([0.0196, 0.0216, 0.0205, 0.0200, 0.0233, 0.0178, 0.0210, 0.0206], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 02:06:44,824 INFO [finetune.py:992] (0/2) Epoch 20, batch 4300, loss[loss=0.146, simple_loss=0.2385, pruned_loss=0.02671, over 12090.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2526, pruned_loss=0.03633, over 2358132.53 frames. ], batch size: 32, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:06:48,511 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0193, 4.6813, 4.7684, 4.9132, 4.7586, 4.9376, 4.8292, 2.5632], device='cuda:0'), covar=tensor([0.0095, 0.0078, 0.0100, 0.0063, 0.0057, 0.0105, 0.0089, 0.0995], device='cuda:0'), in_proj_covar=tensor([0.0076, 0.0088, 0.0092, 0.0080, 0.0066, 0.0102, 0.0089, 0.0107], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 02:06:53,301 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=336760.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:06:56,246 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.2601, 4.6487, 4.1405, 5.0192, 4.4579, 3.0847, 4.2568, 3.0141], device='cuda:0'), covar=tensor([0.0868, 0.0805, 0.1467, 0.0471, 0.1217, 0.1648, 0.1162, 0.3561], device='cuda:0'), in_proj_covar=tensor([0.0322, 0.0389, 0.0372, 0.0351, 0.0382, 0.0285, 0.0358, 0.0377], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 02:06:59,673 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=336769.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:07:20,132 INFO [finetune.py:992] (0/2) Epoch 20, batch 4350, loss[loss=0.1478, simple_loss=0.2403, pruned_loss=0.02767, over 12083.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2518, pruned_loss=0.03615, over 2352197.13 frames. ], batch size: 32, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:07:26,568 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.803e+02 2.528e+02 2.921e+02 3.513e+02 8.768e+02, threshold=5.842e+02, percent-clipped=3.0 2023-05-19 02:07:55,856 INFO [finetune.py:992] (0/2) Epoch 20, batch 4400, loss[loss=0.1351, simple_loss=0.2236, pruned_loss=0.02333, over 12338.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2525, pruned_loss=0.03622, over 2359050.08 frames. ], batch size: 30, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:08:30,667 INFO [finetune.py:992] (0/2) Epoch 20, batch 4450, loss[loss=0.1834, simple_loss=0.277, pruned_loss=0.0449, over 11237.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.2521, pruned_loss=0.03582, over 2356189.24 frames. ], batch size: 55, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:08:30,808 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=336898.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:08:36,951 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.830e+02 2.654e+02 2.944e+02 3.497e+02 1.198e+03, threshold=5.888e+02, percent-clipped=2.0 2023-05-19 02:09:05,759 INFO [finetune.py:992] (0/2) Epoch 20, batch 4500, loss[loss=0.1744, simple_loss=0.2712, pruned_loss=0.03883, over 12149.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.2522, pruned_loss=0.03585, over 2359642.74 frames. ], batch size: 39, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:09:11,444 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.5020, 2.4141, 3.6170, 4.4806, 3.8614, 4.5219, 3.9175, 3.0999], device='cuda:0'), covar=tensor([0.0047, 0.0451, 0.0154, 0.0053, 0.0141, 0.0080, 0.0145, 0.0422], device='cuda:0'), in_proj_covar=tensor([0.0094, 0.0127, 0.0107, 0.0085, 0.0109, 0.0121, 0.0107, 0.0142], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 02:09:13,599 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=336959.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:09:40,741 INFO [finetune.py:992] (0/2) Epoch 20, batch 4550, loss[loss=0.1397, simple_loss=0.2179, pruned_loss=0.03068, over 11860.00 frames. ], tot_loss[loss=0.1615, simple_loss=0.2519, pruned_loss=0.03555, over 2365934.88 frames. ], batch size: 26, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:09:46,973 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.592e+02 3.019e+02 3.499e+02 5.895e+02, threshold=6.039e+02, percent-clipped=1.0 2023-05-19 02:10:10,338 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.4269, 4.7548, 3.0668, 2.8033, 4.1952, 2.6811, 4.0604, 3.3846], device='cuda:0'), covar=tensor([0.0717, 0.0573, 0.1148, 0.1499, 0.0327, 0.1441, 0.0481, 0.0790], device='cuda:0'), in_proj_covar=tensor([0.0192, 0.0267, 0.0182, 0.0207, 0.0149, 0.0189, 0.0205, 0.0181], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 02:10:15,107 INFO [finetune.py:992] (0/2) Epoch 20, batch 4600, loss[loss=0.1485, simple_loss=0.2339, pruned_loss=0.03155, over 12097.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.2514, pruned_loss=0.0355, over 2375136.39 frames. ], batch size: 33, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:10:23,536 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=337060.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:10:26,153 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=337064.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:10:42,858 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.46 vs. limit=2.0 2023-05-19 02:10:50,047 INFO [finetune.py:992] (0/2) Epoch 20, batch 4650, loss[loss=0.1502, simple_loss=0.2382, pruned_loss=0.03112, over 12293.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2522, pruned_loss=0.03599, over 2372176.99 frames. ], batch size: 34, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:10:54,055 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.43 vs. limit=2.0 2023-05-19 02:10:56,354 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.912e+02 2.687e+02 2.925e+02 3.531e+02 6.038e+02, threshold=5.850e+02, percent-clipped=0.0 2023-05-19 02:10:57,100 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=337108.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:11:05,142 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.25 vs. limit=2.0 2023-05-19 02:11:24,969 INFO [finetune.py:992] (0/2) Epoch 20, batch 4700, loss[loss=0.1386, simple_loss=0.222, pruned_loss=0.02766, over 12357.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.252, pruned_loss=0.03563, over 2381010.13 frames. ], batch size: 30, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:11:42,673 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.7659, 2.4502, 3.2331, 3.7232, 3.4629, 3.7665, 3.4913, 2.7732], device='cuda:0'), covar=tensor([0.0068, 0.0403, 0.0179, 0.0073, 0.0136, 0.0106, 0.0154, 0.0399], device='cuda:0'), in_proj_covar=tensor([0.0094, 0.0126, 0.0107, 0.0085, 0.0109, 0.0121, 0.0106, 0.0141], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 02:11:59,478 INFO [finetune.py:992] (0/2) Epoch 20, batch 4750, loss[loss=0.1586, simple_loss=0.248, pruned_loss=0.03463, over 12286.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2515, pruned_loss=0.03565, over 2374135.50 frames. ], batch size: 33, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:11:59,694 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.5147, 3.6011, 3.2225, 3.0529, 2.7471, 2.7501, 3.6709, 2.2979], device='cuda:0'), covar=tensor([0.0446, 0.0140, 0.0231, 0.0263, 0.0527, 0.0439, 0.0162, 0.0567], device='cuda:0'), in_proj_covar=tensor([0.0200, 0.0170, 0.0176, 0.0200, 0.0208, 0.0205, 0.0182, 0.0211], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 02:12:06,129 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.728e+02 2.777e+02 3.175e+02 3.703e+02 5.644e+02, threshold=6.351e+02, percent-clipped=0.0 2023-05-19 02:12:12,053 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.7257, 3.4209, 5.1964, 2.8602, 3.0108, 3.9118, 3.1786, 3.9426], device='cuda:0'), covar=tensor([0.0458, 0.1183, 0.0333, 0.1204, 0.2009, 0.1507, 0.1513, 0.1145], device='cuda:0'), in_proj_covar=tensor([0.0247, 0.0245, 0.0271, 0.0192, 0.0244, 0.0302, 0.0234, 0.0279], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 02:12:21,692 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.5093, 2.5076, 3.0850, 4.2770, 2.5538, 4.3006, 4.4168, 4.4301], device='cuda:0'), covar=tensor([0.0118, 0.1490, 0.0624, 0.0156, 0.1315, 0.0296, 0.0182, 0.0105], device='cuda:0'), in_proj_covar=tensor([0.0127, 0.0210, 0.0188, 0.0128, 0.0192, 0.0187, 0.0185, 0.0129], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:0') 2023-05-19 02:12:30,539 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2023-05-19 02:12:35,562 INFO [finetune.py:992] (0/2) Epoch 20, batch 4800, loss[loss=0.1437, simple_loss=0.228, pruned_loss=0.02968, over 12139.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2521, pruned_loss=0.0362, over 2371557.57 frames. ], batch size: 30, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:12:39,882 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=337254.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:13:03,888 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=3.15 vs. limit=5.0 2023-05-19 02:13:11,037 INFO [finetune.py:992] (0/2) Epoch 20, batch 4850, loss[loss=0.1827, simple_loss=0.2664, pruned_loss=0.04948, over 11819.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2515, pruned_loss=0.03563, over 2378386.40 frames. ], batch size: 44, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:13:11,192 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=337298.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:13:16,999 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.62 vs. limit=2.0 2023-05-19 02:13:17,212 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.759e+02 2.560e+02 3.113e+02 3.760e+02 7.486e+02, threshold=6.227e+02, percent-clipped=4.0 2023-05-19 02:13:27,670 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.2515, 4.8326, 5.2500, 4.6031, 4.9286, 4.7216, 5.2948, 4.8911], device='cuda:0'), covar=tensor([0.0278, 0.0408, 0.0263, 0.0286, 0.0465, 0.0348, 0.0215, 0.0387], device='cuda:0'), in_proj_covar=tensor([0.0290, 0.0290, 0.0314, 0.0286, 0.0286, 0.0286, 0.0261, 0.0235], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 02:13:40,274 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.1486, 4.7293, 4.8458, 4.9534, 4.8274, 5.0574, 4.8783, 2.8027], device='cuda:0'), covar=tensor([0.0077, 0.0091, 0.0097, 0.0065, 0.0050, 0.0097, 0.0104, 0.0806], device='cuda:0'), in_proj_covar=tensor([0.0076, 0.0087, 0.0092, 0.0080, 0.0067, 0.0102, 0.0089, 0.0107], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 02:13:45,739 INFO [finetune.py:992] (0/2) Epoch 20, batch 4900, loss[loss=0.1888, simple_loss=0.2829, pruned_loss=0.04729, over 12116.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.2518, pruned_loss=0.03579, over 2373708.32 frames. ], batch size: 39, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:13:53,511 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=337359.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:13:56,778 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=337364.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:14:20,457 INFO [finetune.py:992] (0/2) Epoch 20, batch 4950, loss[loss=0.1632, simple_loss=0.2523, pruned_loss=0.03706, over 12095.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.2519, pruned_loss=0.03586, over 2358736.13 frames. ], batch size: 33, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:14:26,848 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.954e+02 2.693e+02 3.241e+02 3.801e+02 7.006e+02, threshold=6.482e+02, percent-clipped=2.0 2023-05-19 02:14:30,972 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=337412.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:14:39,344 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=337424.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:14:55,655 INFO [finetune.py:992] (0/2) Epoch 20, batch 5000, loss[loss=0.1665, simple_loss=0.2579, pruned_loss=0.03758, over 11338.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2524, pruned_loss=0.03596, over 2364617.20 frames. ], batch size: 55, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:15:21,307 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=337485.0, num_to_drop=1, layers_to_drop={3} 2023-05-19 02:15:29,957 INFO [finetune.py:992] (0/2) Epoch 20, batch 5050, loss[loss=0.162, simple_loss=0.2549, pruned_loss=0.03454, over 12155.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.2519, pruned_loss=0.03569, over 2367561.25 frames. ], batch size: 34, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:15:36,368 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.854e+02 2.518e+02 2.863e+02 3.518e+02 6.980e+02, threshold=5.726e+02, percent-clipped=2.0 2023-05-19 02:16:03,780 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.47 vs. limit=2.0 2023-05-19 02:16:06,007 INFO [finetune.py:992] (0/2) Epoch 20, batch 5100, loss[loss=0.152, simple_loss=0.2341, pruned_loss=0.03495, over 12179.00 frames. ], tot_loss[loss=0.1603, simple_loss=0.2502, pruned_loss=0.03523, over 2364131.98 frames. ], batch size: 29, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:16:10,294 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=337554.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:16:20,870 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.1207, 4.7074, 5.0970, 4.4661, 4.7871, 4.5458, 5.1423, 4.7504], device='cuda:0'), covar=tensor([0.0300, 0.0444, 0.0300, 0.0299, 0.0485, 0.0402, 0.0236, 0.0362], device='cuda:0'), in_proj_covar=tensor([0.0290, 0.0290, 0.0316, 0.0286, 0.0287, 0.0286, 0.0260, 0.0235], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 02:16:33,169 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=337587.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:16:40,682 INFO [finetune.py:992] (0/2) Epoch 20, batch 5150, loss[loss=0.168, simple_loss=0.262, pruned_loss=0.03703, over 12167.00 frames. ], tot_loss[loss=0.1598, simple_loss=0.2496, pruned_loss=0.03493, over 2361206.23 frames. ], batch size: 36, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:16:40,883 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=337598.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:16:43,757 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=337602.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:16:47,079 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.885e+02 2.559e+02 2.941e+02 3.528e+02 7.913e+02, threshold=5.882e+02, percent-clipped=2.0 2023-05-19 02:17:15,275 INFO [finetune.py:992] (0/2) Epoch 20, batch 5200, loss[loss=0.1342, simple_loss=0.2175, pruned_loss=0.02548, over 12365.00 frames. ], tot_loss[loss=0.1591, simple_loss=0.2489, pruned_loss=0.03465, over 2373129.72 frames. ], batch size: 30, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 02:17:15,465 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=337648.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:17:19,480 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=337654.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:17:23,734 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=337659.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:17:51,180 INFO [finetune.py:992] (0/2) Epoch 20, batch 5250, loss[loss=0.1416, simple_loss=0.235, pruned_loss=0.0241, over 12304.00 frames. ], tot_loss[loss=0.1602, simple_loss=0.2503, pruned_loss=0.03499, over 2378663.64 frames. ], batch size: 34, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 02:17:58,057 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.650e+02 2.634e+02 3.132e+02 3.946e+02 9.591e+02, threshold=6.265e+02, percent-clipped=4.0 2023-05-19 02:18:00,525 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=2.63 vs. limit=5.0 2023-05-19 02:18:24,551 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.4003, 4.6671, 2.9600, 2.6667, 4.0933, 2.6567, 3.8978, 3.3522], device='cuda:0'), covar=tensor([0.0688, 0.0591, 0.1129, 0.1690, 0.0332, 0.1439, 0.0676, 0.0870], device='cuda:0'), in_proj_covar=tensor([0.0193, 0.0269, 0.0182, 0.0208, 0.0150, 0.0190, 0.0207, 0.0182], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 02:18:25,744 INFO [finetune.py:992] (0/2) Epoch 20, batch 5300, loss[loss=0.1294, simple_loss=0.2189, pruned_loss=0.01999, over 12008.00 frames. ], tot_loss[loss=0.1602, simple_loss=0.2505, pruned_loss=0.03499, over 2374189.72 frames. ], batch size: 28, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 02:18:38,497 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.3958, 4.1738, 4.0562, 4.5629, 3.2057, 3.9586, 2.7193, 4.1884], device='cuda:0'), covar=tensor([0.1541, 0.0665, 0.1040, 0.0627, 0.1135, 0.0650, 0.1875, 0.1104], device='cuda:0'), in_proj_covar=tensor([0.0235, 0.0275, 0.0304, 0.0368, 0.0247, 0.0251, 0.0268, 0.0376], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-19 02:18:48,139 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=337780.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 02:19:00,143 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.2129, 5.9935, 5.5507, 5.5276, 6.1098, 5.3729, 5.5033, 5.5594], device='cuda:0'), covar=tensor([0.1520, 0.0946, 0.1204, 0.2028, 0.0982, 0.2436, 0.1999, 0.1233], device='cuda:0'), in_proj_covar=tensor([0.0377, 0.0528, 0.0425, 0.0473, 0.0485, 0.0467, 0.0425, 0.0406], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 02:19:00,746 INFO [finetune.py:992] (0/2) Epoch 20, batch 5350, loss[loss=0.1476, simple_loss=0.2355, pruned_loss=0.02989, over 12118.00 frames. ], tot_loss[loss=0.16, simple_loss=0.2503, pruned_loss=0.03486, over 2373520.51 frames. ], batch size: 33, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 02:19:08,833 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.842e+02 2.541e+02 2.870e+02 3.521e+02 6.072e+02, threshold=5.739e+02, percent-clipped=0.0 2023-05-19 02:19:11,069 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.2708, 2.4788, 3.4648, 4.2862, 3.7623, 4.3471, 3.8504, 3.0406], device='cuda:0'), covar=tensor([0.0050, 0.0434, 0.0171, 0.0051, 0.0134, 0.0070, 0.0126, 0.0417], device='cuda:0'), in_proj_covar=tensor([0.0093, 0.0124, 0.0106, 0.0084, 0.0107, 0.0119, 0.0105, 0.0139], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 02:19:16,103 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.25 vs. limit=2.0 2023-05-19 02:19:17,271 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.4194, 4.7765, 2.9903, 2.6530, 4.2046, 2.7149, 3.9946, 3.4318], device='cuda:0'), covar=tensor([0.0704, 0.0535, 0.1150, 0.1732, 0.0257, 0.1360, 0.0587, 0.0816], device='cuda:0'), in_proj_covar=tensor([0.0194, 0.0270, 0.0183, 0.0209, 0.0150, 0.0191, 0.0208, 0.0182], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 02:19:21,440 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.9966, 5.9605, 5.7328, 5.2364, 5.1898, 5.8632, 5.5479, 5.1944], device='cuda:0'), covar=tensor([0.0760, 0.0911, 0.0665, 0.1825, 0.0834, 0.0805, 0.1424, 0.1096], device='cuda:0'), in_proj_covar=tensor([0.0666, 0.0602, 0.0553, 0.0681, 0.0452, 0.0781, 0.0828, 0.0597], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0004, 0.0004, 0.0002], device='cuda:0') 2023-05-19 02:19:37,042 INFO [finetune.py:992] (0/2) Epoch 20, batch 5400, loss[loss=0.1578, simple_loss=0.2462, pruned_loss=0.03472, over 12076.00 frames. ], tot_loss[loss=0.1596, simple_loss=0.25, pruned_loss=0.0346, over 2384062.68 frames. ], batch size: 32, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 02:19:39,981 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=337852.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:19:59,351 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.2743, 4.8402, 5.2675, 4.6686, 4.9576, 4.7287, 5.3114, 4.9098], device='cuda:0'), covar=tensor([0.0282, 0.0449, 0.0288, 0.0271, 0.0443, 0.0368, 0.0212, 0.0368], device='cuda:0'), in_proj_covar=tensor([0.0289, 0.0289, 0.0315, 0.0286, 0.0286, 0.0285, 0.0259, 0.0234], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 02:20:10,625 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.67 vs. limit=5.0 2023-05-19 02:20:11,512 INFO [finetune.py:992] (0/2) Epoch 20, batch 5450, loss[loss=0.1558, simple_loss=0.2515, pruned_loss=0.03, over 11256.00 frames. ], tot_loss[loss=0.1601, simple_loss=0.2508, pruned_loss=0.03472, over 2382519.64 frames. ], batch size: 55, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 02:20:14,334 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=337902.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:20:18,428 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.713e+02 3.124e+02 3.875e+02 8.180e+02, threshold=6.247e+02, percent-clipped=4.0 2023-05-19 02:20:22,123 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=337913.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:20:43,125 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=337943.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:20:46,500 INFO [finetune.py:992] (0/2) Epoch 20, batch 5500, loss[loss=0.1679, simple_loss=0.2562, pruned_loss=0.03975, over 11797.00 frames. ], tot_loss[loss=0.1597, simple_loss=0.2502, pruned_loss=0.03458, over 2385907.91 frames. ], batch size: 44, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:20:48,731 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.4899, 5.0412, 5.4962, 4.8085, 5.1073, 4.9289, 5.5363, 5.0651], device='cuda:0'), covar=tensor([0.0252, 0.0387, 0.0267, 0.0268, 0.0458, 0.0345, 0.0232, 0.0265], device='cuda:0'), in_proj_covar=tensor([0.0289, 0.0290, 0.0315, 0.0287, 0.0287, 0.0285, 0.0260, 0.0234], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 02:20:50,698 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=337954.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:20:50,817 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=337954.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:20:57,809 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=337963.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:21:22,346 INFO [finetune.py:992] (0/2) Epoch 20, batch 5550, loss[loss=0.1515, simple_loss=0.2413, pruned_loss=0.03081, over 12299.00 frames. ], tot_loss[loss=0.1604, simple_loss=0.2507, pruned_loss=0.03501, over 2376078.66 frames. ], batch size: 34, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:21:24,082 INFO [checkpoint.py:75] (0/2) Saving checkpoint to pruned_transducer_stateless7/exp_giga_finetune/checkpoint-238000.pt 2023-05-19 02:21:28,289 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=338002.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:21:32,380 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.574e+02 2.959e+02 3.514e+02 7.470e+02, threshold=5.917e+02, percent-clipped=1.0 2023-05-19 02:21:58,958 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=338046.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:21:59,825 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.27 vs. limit=2.0 2023-05-19 02:22:00,135 INFO [finetune.py:992] (0/2) Epoch 20, batch 5600, loss[loss=0.1577, simple_loss=0.2508, pruned_loss=0.03233, over 12335.00 frames. ], tot_loss[loss=0.1604, simple_loss=0.2509, pruned_loss=0.03495, over 2371717.07 frames. ], batch size: 36, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:22:22,078 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=338080.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 02:22:22,116 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=338080.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:22:35,036 INFO [finetune.py:992] (0/2) Epoch 20, batch 5650, loss[loss=0.1735, simple_loss=0.264, pruned_loss=0.04147, over 12298.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.2524, pruned_loss=0.03541, over 2377845.31 frames. ], batch size: 34, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:22:41,593 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=338107.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:22:42,057 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.795e+02 2.626e+02 3.148e+02 3.707e+02 9.372e+02, threshold=6.297e+02, percent-clipped=1.0 2023-05-19 02:22:43,763 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.7781, 2.2113, 3.4214, 2.8355, 3.3213, 2.9702, 2.2016, 3.3723], device='cuda:0'), covar=tensor([0.0219, 0.0535, 0.0212, 0.0352, 0.0209, 0.0273, 0.0566, 0.0181], device='cuda:0'), in_proj_covar=tensor([0.0198, 0.0220, 0.0209, 0.0205, 0.0238, 0.0181, 0.0213, 0.0209], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 02:22:50,671 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.7266, 3.3435, 5.1342, 2.6935, 2.9521, 3.7844, 3.2034, 3.7565], device='cuda:0'), covar=tensor([0.0453, 0.1283, 0.0286, 0.1246, 0.1985, 0.1545, 0.1489, 0.1292], device='cuda:0'), in_proj_covar=tensor([0.0246, 0.0244, 0.0269, 0.0191, 0.0242, 0.0301, 0.0232, 0.0278], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 02:22:54,036 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([6.0193, 6.0202, 5.8021, 5.2457, 5.2048, 5.8953, 5.5425, 5.2964], device='cuda:0'), covar=tensor([0.0720, 0.0815, 0.0716, 0.1812, 0.0781, 0.0826, 0.1489, 0.0967], device='cuda:0'), in_proj_covar=tensor([0.0667, 0.0601, 0.0555, 0.0682, 0.0452, 0.0783, 0.0829, 0.0597], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0004, 0.0004, 0.0002], device='cuda:0') 2023-05-19 02:22:56,056 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=338128.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:23:05,132 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=338141.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 02:23:07,853 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0610, 5.9475, 5.5309, 5.3820, 6.0547, 5.3097, 5.4338, 5.4252], device='cuda:0'), covar=tensor([0.1942, 0.1001, 0.1122, 0.2264, 0.1013, 0.2615, 0.2166, 0.1260], device='cuda:0'), in_proj_covar=tensor([0.0378, 0.0530, 0.0426, 0.0472, 0.0487, 0.0468, 0.0424, 0.0407], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 02:23:10,545 INFO [finetune.py:992] (0/2) Epoch 20, batch 5700, loss[loss=0.1408, simple_loss=0.2226, pruned_loss=0.02952, over 12341.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.2517, pruned_loss=0.03519, over 2383349.04 frames. ], batch size: 30, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:23:14,909 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.1949, 4.7809, 5.0116, 5.0408, 4.8508, 5.0662, 4.9084, 3.0064], device='cuda:0'), covar=tensor([0.0092, 0.0084, 0.0086, 0.0066, 0.0058, 0.0102, 0.0083, 0.0750], device='cuda:0'), in_proj_covar=tensor([0.0074, 0.0086, 0.0091, 0.0079, 0.0065, 0.0101, 0.0087, 0.0104], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 02:23:19,232 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.14 vs. limit=2.0 2023-05-19 02:23:44,757 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.1566, 2.3710, 3.6231, 3.0368, 3.4505, 3.2291, 2.5351, 3.4349], device='cuda:0'), covar=tensor([0.0168, 0.0555, 0.0189, 0.0329, 0.0227, 0.0250, 0.0513, 0.0268], device='cuda:0'), in_proj_covar=tensor([0.0199, 0.0220, 0.0210, 0.0206, 0.0239, 0.0182, 0.0214, 0.0210], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 02:23:45,267 INFO [finetune.py:992] (0/2) Epoch 20, batch 5750, loss[loss=0.1674, simple_loss=0.257, pruned_loss=0.03891, over 12341.00 frames. ], tot_loss[loss=0.1606, simple_loss=0.2511, pruned_loss=0.03509, over 2380999.65 frames. ], batch size: 36, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:23:52,352 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.434e+02 2.922e+02 3.324e+02 6.563e+02, threshold=5.844e+02, percent-clipped=1.0 2023-05-19 02:23:52,436 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=338208.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:24:17,592 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=338243.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:24:21,008 INFO [finetune.py:992] (0/2) Epoch 20, batch 5800, loss[loss=0.1642, simple_loss=0.2621, pruned_loss=0.03313, over 12349.00 frames. ], tot_loss[loss=0.1608, simple_loss=0.2514, pruned_loss=0.03509, over 2387043.21 frames. ], batch size: 35, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:24:25,436 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=338254.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:24:28,106 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=338258.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:24:51,044 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=338291.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:24:51,213 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.5099, 4.8619, 3.2619, 2.8424, 4.2539, 2.9345, 4.1831, 3.5016], device='cuda:0'), covar=tensor([0.0776, 0.0653, 0.1086, 0.1534, 0.0340, 0.1218, 0.0547, 0.0743], device='cuda:0'), in_proj_covar=tensor([0.0194, 0.0270, 0.0183, 0.0208, 0.0150, 0.0190, 0.0207, 0.0182], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 02:24:56,405 INFO [finetune.py:992] (0/2) Epoch 20, batch 5850, loss[loss=0.1706, simple_loss=0.2657, pruned_loss=0.03775, over 10544.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.2518, pruned_loss=0.03533, over 2386693.87 frames. ], batch size: 68, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:24:59,226 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=338302.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:25:03,451 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.914e+02 2.653e+02 3.094e+02 3.992e+02 1.662e+03, threshold=6.188e+02, percent-clipped=6.0 2023-05-19 02:25:17,282 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.3367, 6.0880, 5.6863, 5.6583, 6.2013, 5.3871, 5.5740, 5.6498], device='cuda:0'), covar=tensor([0.1574, 0.0938, 0.0943, 0.2012, 0.0849, 0.2172, 0.1995, 0.1184], device='cuda:0'), in_proj_covar=tensor([0.0373, 0.0523, 0.0421, 0.0466, 0.0481, 0.0461, 0.0419, 0.0402], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 02:25:25,611 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.2767, 4.8805, 5.0229, 5.1783, 4.8260, 5.1867, 4.9806, 3.0426], device='cuda:0'), covar=tensor([0.0090, 0.0066, 0.0085, 0.0051, 0.0052, 0.0090, 0.0069, 0.0691], device='cuda:0'), in_proj_covar=tensor([0.0074, 0.0085, 0.0089, 0.0078, 0.0065, 0.0099, 0.0086, 0.0103], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 02:25:30,907 INFO [finetune.py:992] (0/2) Epoch 20, batch 5900, loss[loss=0.1574, simple_loss=0.2527, pruned_loss=0.03106, over 12298.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2527, pruned_loss=0.03605, over 2375202.77 frames. ], batch size: 33, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:26:06,144 INFO [finetune.py:992] (0/2) Epoch 20, batch 5950, loss[loss=0.1773, simple_loss=0.2692, pruned_loss=0.04269, over 12354.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2528, pruned_loss=0.03624, over 2368970.99 frames. ], batch size: 36, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:26:09,416 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=338402.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:26:13,534 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.664e+02 2.624e+02 3.116e+02 3.722e+02 7.691e+02, threshold=6.232e+02, percent-clipped=1.0 2023-05-19 02:26:14,467 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=338409.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:26:33,612 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=338436.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 02:26:41,810 INFO [finetune.py:992] (0/2) Epoch 20, batch 6000, loss[loss=0.2035, simple_loss=0.2788, pruned_loss=0.06413, over 7950.00 frames. ], tot_loss[loss=0.1633, simple_loss=0.2536, pruned_loss=0.03653, over 2357377.10 frames. ], batch size: 97, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:26:41,810 INFO [finetune.py:1017] (0/2) Computing validation loss 2023-05-19 02:26:57,629 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0781, 4.9749, 5.1653, 5.0883, 4.7523, 4.7856, 4.5748, 5.0097], device='cuda:0'), covar=tensor([0.0830, 0.0549, 0.0705, 0.0598, 0.1896, 0.1476, 0.0596, 0.1066], device='cuda:0'), in_proj_covar=tensor([0.0574, 0.0752, 0.0658, 0.0669, 0.0905, 0.0785, 0.0600, 0.0509], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0003, 0.0003], device='cuda:0') 2023-05-19 02:26:59,424 INFO [finetune.py:1026] (0/2) Epoch 20, validation: loss=0.3087, simple_loss=0.3852, pruned_loss=0.116, over 1020973.00 frames. 2023-05-19 02:26:59,425 INFO [finetune.py:1027] (0/2) Maximum memory allocated so far is 12525MB 2023-05-19 02:27:15,484 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=338470.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:27:22,546 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.7498, 2.9330, 4.5711, 4.7010, 2.7775, 2.6346, 3.0019, 2.1509], device='cuda:0'), covar=tensor([0.1733, 0.3115, 0.0506, 0.0450, 0.1403, 0.2591, 0.2906, 0.4165], device='cuda:0'), in_proj_covar=tensor([0.0318, 0.0402, 0.0288, 0.0316, 0.0289, 0.0331, 0.0415, 0.0388], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-19 02:27:34,775 INFO [finetune.py:992] (0/2) Epoch 20, batch 6050, loss[loss=0.1697, simple_loss=0.2595, pruned_loss=0.03995, over 12344.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.2534, pruned_loss=0.03655, over 2365703.88 frames. ], batch size: 36, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:27:41,668 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.577e+02 2.670e+02 3.111e+02 3.696e+02 7.955e+02, threshold=6.222e+02, percent-clipped=2.0 2023-05-19 02:27:41,782 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=338508.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:27:55,662 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.9654, 3.4642, 5.3601, 2.6842, 2.8840, 3.8964, 3.2498, 3.8054], device='cuda:0'), covar=tensor([0.0345, 0.1212, 0.0202, 0.1215, 0.2076, 0.1467, 0.1512, 0.1309], device='cuda:0'), in_proj_covar=tensor([0.0246, 0.0244, 0.0269, 0.0191, 0.0242, 0.0301, 0.0232, 0.0277], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 02:28:09,943 INFO [finetune.py:992] (0/2) Epoch 20, batch 6100, loss[loss=0.1535, simple_loss=0.2513, pruned_loss=0.02788, over 12040.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2526, pruned_loss=0.03616, over 2374671.35 frames. ], batch size: 40, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:28:15,467 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=338556.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:28:16,951 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=338558.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:28:19,182 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.8813, 3.0543, 4.8017, 4.8895, 2.8632, 2.6795, 3.1137, 2.3514], device='cuda:0'), covar=tensor([0.1698, 0.3040, 0.0417, 0.0455, 0.1482, 0.2718, 0.2847, 0.4211], device='cuda:0'), in_proj_covar=tensor([0.0319, 0.0403, 0.0289, 0.0316, 0.0289, 0.0332, 0.0415, 0.0389], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-19 02:28:37,039 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([6.1958, 6.1162, 5.9300, 5.5148, 5.3911, 6.1005, 5.7461, 5.4002], device='cuda:0'), covar=tensor([0.0655, 0.1025, 0.0737, 0.1838, 0.0656, 0.0705, 0.1469, 0.1042], device='cuda:0'), in_proj_covar=tensor([0.0673, 0.0606, 0.0561, 0.0689, 0.0459, 0.0791, 0.0839, 0.0606], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0004, 0.0004, 0.0003], device='cuda:0') 2023-05-19 02:28:44,425 INFO [finetune.py:992] (0/2) Epoch 20, batch 6150, loss[loss=0.1612, simple_loss=0.254, pruned_loss=0.0342, over 12113.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2524, pruned_loss=0.03593, over 2378837.65 frames. ], batch size: 39, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:28:44,581 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=338598.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:28:50,330 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=338606.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:28:51,689 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.699e+02 2.539e+02 2.985e+02 3.635e+02 5.861e+02, threshold=5.970e+02, percent-clipped=0.0 2023-05-19 02:28:52,648 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.8314, 2.9771, 4.6109, 4.7451, 2.8229, 2.6193, 2.9915, 2.1549], device='cuda:0'), covar=tensor([0.1688, 0.2916, 0.0499, 0.0431, 0.1464, 0.2663, 0.2942, 0.4331], device='cuda:0'), in_proj_covar=tensor([0.0319, 0.0404, 0.0289, 0.0316, 0.0289, 0.0332, 0.0415, 0.0389], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-19 02:29:19,833 INFO [finetune.py:992] (0/2) Epoch 20, batch 6200, loss[loss=0.1726, simple_loss=0.2639, pruned_loss=0.04066, over 12361.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2526, pruned_loss=0.03576, over 2384053.26 frames. ], batch size: 36, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:29:27,647 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=338659.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:29:33,218 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.3330, 4.5242, 2.6861, 2.2837, 4.1001, 2.5013, 3.8720, 3.1504], device='cuda:0'), covar=tensor([0.0723, 0.0471, 0.1260, 0.1774, 0.0238, 0.1407, 0.0508, 0.0860], device='cuda:0'), in_proj_covar=tensor([0.0194, 0.0269, 0.0182, 0.0207, 0.0150, 0.0190, 0.0207, 0.0182], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 02:29:34,563 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=338669.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:29:51,641 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=338693.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:29:54,801 INFO [finetune.py:992] (0/2) Epoch 20, batch 6250, loss[loss=0.1592, simple_loss=0.2404, pruned_loss=0.03899, over 12279.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2534, pruned_loss=0.03628, over 2380630.90 frames. ], batch size: 28, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:29:57,625 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=338702.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:30:00,450 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=338706.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:30:01,701 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.630e+02 2.964e+02 3.764e+02 6.429e+02, threshold=5.928e+02, percent-clipped=1.0 2023-05-19 02:30:17,106 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=338730.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:30:21,117 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=338736.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:30:28,651 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.0007, 2.9514, 4.4388, 2.5904, 2.6050, 3.5069, 2.9168, 3.5693], device='cuda:0'), covar=tensor([0.0632, 0.1510, 0.0398, 0.1300, 0.2184, 0.1335, 0.1620, 0.1192], device='cuda:0'), in_proj_covar=tensor([0.0246, 0.0243, 0.0268, 0.0191, 0.0242, 0.0300, 0.0231, 0.0276], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 02:30:29,024 INFO [finetune.py:992] (0/2) Epoch 20, batch 6300, loss[loss=0.2141, simple_loss=0.3029, pruned_loss=0.06266, over 10709.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.2538, pruned_loss=0.03651, over 2383354.88 frames. ], batch size: 68, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:30:30,489 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=338750.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:30:33,454 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=338754.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:30:41,061 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=338765.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:30:42,547 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=338767.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:30:48,828 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=338775.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:30:54,630 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=338784.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:31:00,859 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.4810, 5.0505, 5.4489, 4.8165, 5.1605, 4.8782, 5.4751, 5.1289], device='cuda:0'), covar=tensor([0.0293, 0.0361, 0.0267, 0.0250, 0.0407, 0.0339, 0.0202, 0.0276], device='cuda:0'), in_proj_covar=tensor([0.0290, 0.0289, 0.0316, 0.0285, 0.0286, 0.0286, 0.0261, 0.0236], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 02:31:04,118 INFO [finetune.py:992] (0/2) Epoch 20, batch 6350, loss[loss=0.1508, simple_loss=0.2371, pruned_loss=0.03223, over 12122.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.2534, pruned_loss=0.03653, over 2377468.03 frames. ], batch size: 30, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:31:11,385 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.938e+02 2.514e+02 2.979e+02 3.494e+02 1.002e+03, threshold=5.958e+02, percent-clipped=5.0 2023-05-19 02:31:21,836 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.2364, 4.4173, 2.7462, 2.2695, 3.9818, 2.4107, 3.8008, 3.0174], device='cuda:0'), covar=tensor([0.0824, 0.0685, 0.1261, 0.1939, 0.0341, 0.1590, 0.0606, 0.0968], device='cuda:0'), in_proj_covar=tensor([0.0194, 0.0269, 0.0183, 0.0207, 0.0150, 0.0190, 0.0208, 0.0182], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 02:31:31,428 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=338836.0, num_to_drop=1, layers_to_drop={2} 2023-05-19 02:31:39,426 INFO [finetune.py:992] (0/2) Epoch 20, batch 6400, loss[loss=0.1473, simple_loss=0.2451, pruned_loss=0.02476, over 12197.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.2531, pruned_loss=0.03625, over 2377094.75 frames. ], batch size: 35, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:31:46,519 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.3151, 4.8179, 5.2876, 4.6403, 5.0149, 4.7166, 5.3188, 4.9925], device='cuda:0'), covar=tensor([0.0274, 0.0407, 0.0282, 0.0268, 0.0408, 0.0333, 0.0195, 0.0320], device='cuda:0'), in_proj_covar=tensor([0.0290, 0.0288, 0.0316, 0.0285, 0.0286, 0.0285, 0.0260, 0.0235], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 02:31:50,706 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=338864.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:32:11,590 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.7641, 2.9422, 4.2957, 4.5257, 2.7714, 2.7103, 3.0234, 2.1865], device='cuda:0'), covar=tensor([0.1745, 0.2939, 0.0572, 0.0470, 0.1444, 0.2535, 0.2811, 0.4237], device='cuda:0'), in_proj_covar=tensor([0.0320, 0.0405, 0.0290, 0.0317, 0.0290, 0.0333, 0.0417, 0.0391], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-19 02:32:14,114 INFO [finetune.py:992] (0/2) Epoch 20, batch 6450, loss[loss=0.1448, simple_loss=0.2415, pruned_loss=0.02409, over 12087.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2528, pruned_loss=0.03603, over 2378814.69 frames. ], batch size: 32, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:32:21,072 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.589e+02 2.941e+02 3.532e+02 8.176e+02, threshold=5.882e+02, percent-clipped=1.0 2023-05-19 02:32:24,230 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.26 vs. limit=5.0 2023-05-19 02:32:33,650 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=338925.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:32:49,123 INFO [finetune.py:992] (0/2) Epoch 20, batch 6500, loss[loss=0.1718, simple_loss=0.2708, pruned_loss=0.03642, over 10540.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.2524, pruned_loss=0.03577, over 2381010.32 frames. ], batch size: 68, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:32:52,852 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=3.98 vs. limit=5.0 2023-05-19 02:32:53,230 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=338954.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:32:56,694 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.91 vs. limit=2.0 2023-05-19 02:33:04,014 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.22 vs. limit=5.0 2023-05-19 02:33:07,405 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.27 vs. limit=5.0 2023-05-19 02:33:24,141 INFO [finetune.py:992] (0/2) Epoch 20, batch 6550, loss[loss=0.1761, simple_loss=0.2691, pruned_loss=0.04159, over 12069.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2528, pruned_loss=0.03614, over 2369337.63 frames. ], batch size: 42, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:33:31,538 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.701e+02 2.627e+02 3.179e+02 3.848e+02 7.308e+02, threshold=6.357e+02, percent-clipped=3.0 2023-05-19 02:33:43,524 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=339025.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:33:59,376 INFO [finetune.py:992] (0/2) Epoch 20, batch 6600, loss[loss=0.1423, simple_loss=0.2356, pruned_loss=0.02445, over 12351.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2535, pruned_loss=0.03631, over 2364852.72 frames. ], batch size: 30, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:34:00,841 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=339049.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:34:01,758 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.2321, 4.2697, 4.1646, 4.5027, 3.1793, 4.0366, 2.7150, 4.2117], device='cuda:0'), covar=tensor([0.1719, 0.0664, 0.0898, 0.0545, 0.1202, 0.0626, 0.1953, 0.1095], device='cuda:0'), in_proj_covar=tensor([0.0234, 0.0277, 0.0305, 0.0370, 0.0249, 0.0252, 0.0268, 0.0379], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-19 02:34:10,154 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=339062.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:34:10,965 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.7625, 2.6933, 3.9691, 4.1699, 2.8730, 2.6294, 2.8595, 2.2452], device='cuda:0'), covar=tensor([0.1711, 0.2850, 0.0587, 0.0517, 0.1280, 0.2605, 0.2746, 0.4107], device='cuda:0'), in_proj_covar=tensor([0.0319, 0.0404, 0.0289, 0.0317, 0.0289, 0.0333, 0.0416, 0.0390], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-19 02:34:12,302 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=339065.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:34:14,373 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=339068.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:34:24,325 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.46 vs. limit=2.0 2023-05-19 02:34:35,624 INFO [finetune.py:992] (0/2) Epoch 20, batch 6650, loss[loss=0.1295, simple_loss=0.2089, pruned_loss=0.02508, over 12295.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.253, pruned_loss=0.03586, over 2371993.60 frames. ], batch size: 28, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:34:42,440 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.712e+02 2.578e+02 2.982e+02 3.642e+02 1.129e+03, threshold=5.965e+02, percent-clipped=1.0 2023-05-19 02:34:45,836 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=339113.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:34:46,620 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.4137, 5.2507, 5.3478, 5.3811, 5.0424, 5.0771, 4.7801, 5.3083], device='cuda:0'), covar=tensor([0.0683, 0.0566, 0.0811, 0.0528, 0.1892, 0.1298, 0.0610, 0.1027], device='cuda:0'), in_proj_covar=tensor([0.0581, 0.0757, 0.0664, 0.0670, 0.0909, 0.0790, 0.0607, 0.0515], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:0') 2023-05-19 02:34:46,681 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.4438, 2.7319, 3.5744, 4.3878, 3.8450, 4.4412, 3.8818, 3.3215], device='cuda:0'), covar=tensor([0.0058, 0.0398, 0.0173, 0.0063, 0.0144, 0.0099, 0.0145, 0.0364], device='cuda:0'), in_proj_covar=tensor([0.0094, 0.0126, 0.0107, 0.0085, 0.0109, 0.0122, 0.0107, 0.0141], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 02:34:53,968 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.19 vs. limit=2.0 2023-05-19 02:34:57,197 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=339129.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:34:58,565 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=339131.0, num_to_drop=1, layers_to_drop={3} 2023-05-19 02:35:10,319 INFO [finetune.py:992] (0/2) Epoch 20, batch 6700, loss[loss=0.1429, simple_loss=0.2364, pruned_loss=0.02469, over 12023.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.2524, pruned_loss=0.03572, over 2378505.75 frames. ], batch size: 31, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:35:45,462 INFO [finetune.py:992] (0/2) Epoch 20, batch 6750, loss[loss=0.1438, simple_loss=0.231, pruned_loss=0.02835, over 12333.00 frames. ], tot_loss[loss=0.162, simple_loss=0.2526, pruned_loss=0.03572, over 2372505.02 frames. ], batch size: 31, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:35:52,624 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.899e+02 2.574e+02 3.082e+02 3.669e+02 4.994e+02, threshold=6.164e+02, percent-clipped=0.0 2023-05-19 02:36:01,109 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=339220.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:36:21,019 INFO [finetune.py:992] (0/2) Epoch 20, batch 6800, loss[loss=0.1353, simple_loss=0.2158, pruned_loss=0.02742, over 12004.00 frames. ], tot_loss[loss=0.1615, simple_loss=0.2516, pruned_loss=0.03566, over 2371120.19 frames. ], batch size: 28, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:36:24,024 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.8166, 2.9396, 4.6038, 4.7870, 2.8023, 2.6664, 3.0168, 2.3560], device='cuda:0'), covar=tensor([0.1781, 0.3184, 0.0492, 0.0481, 0.1496, 0.2722, 0.3096, 0.4016], device='cuda:0'), in_proj_covar=tensor([0.0318, 0.0403, 0.0288, 0.0317, 0.0288, 0.0332, 0.0416, 0.0389], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-19 02:36:25,231 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=339254.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:36:28,725 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.9874, 4.7895, 4.7686, 4.8396, 4.4394, 4.9773, 4.9865, 5.1685], device='cuda:0'), covar=tensor([0.0263, 0.0206, 0.0219, 0.0384, 0.0888, 0.0309, 0.0185, 0.0216], device='cuda:0'), in_proj_covar=tensor([0.0211, 0.0210, 0.0204, 0.0264, 0.0252, 0.0237, 0.0192, 0.0249], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:0') 2023-05-19 02:36:55,891 INFO [finetune.py:992] (0/2) Epoch 20, batch 6850, loss[loss=0.139, simple_loss=0.2277, pruned_loss=0.02509, over 12089.00 frames. ], tot_loss[loss=0.1615, simple_loss=0.2516, pruned_loss=0.03567, over 2375477.95 frames. ], batch size: 32, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:36:56,033 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=339298.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:36:58,648 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=339302.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:37:02,620 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.587e+02 2.617e+02 2.954e+02 3.584e+02 7.644e+02, threshold=5.907e+02, percent-clipped=3.0 2023-05-19 02:37:11,569 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.1910, 4.8148, 4.7575, 5.0773, 4.7646, 5.0329, 4.8944, 2.5620], device='cuda:0'), covar=tensor([0.0085, 0.0070, 0.0109, 0.0050, 0.0060, 0.0101, 0.0078, 0.0929], device='cuda:0'), in_proj_covar=tensor([0.0074, 0.0086, 0.0090, 0.0078, 0.0066, 0.0100, 0.0087, 0.0104], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 02:37:14,290 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=339325.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:37:18,630 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([6.0573, 5.9798, 5.7534, 5.3081, 5.1681, 5.9416, 5.6303, 5.2355], device='cuda:0'), covar=tensor([0.0684, 0.1141, 0.0749, 0.1763, 0.0821, 0.0780, 0.1423, 0.1076], device='cuda:0'), in_proj_covar=tensor([0.0664, 0.0598, 0.0556, 0.0677, 0.0457, 0.0781, 0.0823, 0.0595], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0004, 0.0004, 0.0002], device='cuda:0') 2023-05-19 02:37:30,835 INFO [finetune.py:992] (0/2) Epoch 20, batch 6900, loss[loss=0.1707, simple_loss=0.2631, pruned_loss=0.03913, over 12283.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2522, pruned_loss=0.03596, over 2374024.16 frames. ], batch size: 37, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:37:31,648 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=339349.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:37:38,827 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=339359.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:37:40,795 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=339362.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:37:48,082 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=339373.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:38:02,694 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=339393.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:38:05,371 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=339397.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:38:06,028 INFO [finetune.py:992] (0/2) Epoch 20, batch 6950, loss[loss=0.1351, simple_loss=0.2229, pruned_loss=0.02366, over 12115.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.2513, pruned_loss=0.0355, over 2376568.95 frames. ], batch size: 30, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:38:13,415 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.747e+02 2.407e+02 2.960e+02 3.546e+02 6.369e+02, threshold=5.921e+02, percent-clipped=2.0 2023-05-19 02:38:14,870 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=339410.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:38:19,759 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.9109, 3.7981, 3.4612, 3.3039, 3.1523, 2.9698, 3.8426, 2.6717], device='cuda:0'), covar=tensor([0.0376, 0.0144, 0.0203, 0.0237, 0.0396, 0.0404, 0.0151, 0.0478], device='cuda:0'), in_proj_covar=tensor([0.0201, 0.0171, 0.0177, 0.0200, 0.0208, 0.0205, 0.0182, 0.0212], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 02:38:24,452 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=339424.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:38:29,326 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=339431.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:38:36,166 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.81 vs. limit=2.0 2023-05-19 02:38:41,169 INFO [finetune.py:992] (0/2) Epoch 20, batch 7000, loss[loss=0.1374, simple_loss=0.2228, pruned_loss=0.02598, over 12291.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.2509, pruned_loss=0.03564, over 2371421.70 frames. ], batch size: 28, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:38:45,514 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=339454.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:39:02,521 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=339479.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:39:08,145 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.7220, 3.0035, 4.5101, 4.6917, 2.9070, 2.6623, 3.0033, 2.1760], device='cuda:0'), covar=tensor([0.1744, 0.3045, 0.0539, 0.0485, 0.1463, 0.2592, 0.2891, 0.4394], device='cuda:0'), in_proj_covar=tensor([0.0319, 0.0404, 0.0288, 0.0318, 0.0289, 0.0332, 0.0415, 0.0389], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-19 02:39:15,904 INFO [finetune.py:992] (0/2) Epoch 20, batch 7050, loss[loss=0.1394, simple_loss=0.2258, pruned_loss=0.02651, over 12077.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2512, pruned_loss=0.03583, over 2379028.06 frames. ], batch size: 32, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:39:23,069 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.934e+02 2.571e+02 3.008e+02 3.605e+02 9.495e+02, threshold=6.016e+02, percent-clipped=3.0 2023-05-19 02:39:27,490 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=2.36 vs. limit=5.0 2023-05-19 02:39:31,473 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=339520.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:39:32,919 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.5596, 5.0322, 5.5700, 4.8876, 5.1901, 4.9021, 5.5882, 5.1578], device='cuda:0'), covar=tensor([0.0302, 0.0441, 0.0257, 0.0264, 0.0430, 0.0343, 0.0208, 0.0300], device='cuda:0'), in_proj_covar=tensor([0.0294, 0.0292, 0.0318, 0.0288, 0.0288, 0.0289, 0.0265, 0.0237], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 02:39:44,060 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=339537.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:39:47,939 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([6.1397, 6.1041, 5.8178, 5.3223, 5.2697, 5.9921, 5.6407, 5.3488], device='cuda:0'), covar=tensor([0.0697, 0.0846, 0.0647, 0.1748, 0.0725, 0.0768, 0.1469, 0.1041], device='cuda:0'), in_proj_covar=tensor([0.0666, 0.0599, 0.0558, 0.0677, 0.0456, 0.0783, 0.0826, 0.0597], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0004, 0.0004, 0.0002], device='cuda:0') 2023-05-19 02:39:51,303 INFO [finetune.py:992] (0/2) Epoch 20, batch 7100, loss[loss=0.1868, simple_loss=0.2776, pruned_loss=0.04801, over 10365.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.252, pruned_loss=0.03605, over 2376666.67 frames. ], batch size: 68, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:40:05,212 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=339568.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:40:26,039 INFO [finetune.py:992] (0/2) Epoch 20, batch 7150, loss[loss=0.1513, simple_loss=0.2405, pruned_loss=0.03099, over 12180.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2514, pruned_loss=0.03566, over 2378395.94 frames. ], batch size: 31, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:40:26,233 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=339598.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:40:33,266 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.660e+02 2.593e+02 2.905e+02 3.471e+02 7.466e+02, threshold=5.810e+02, percent-clipped=2.0 2023-05-19 02:41:01,937 INFO [finetune.py:992] (0/2) Epoch 20, batch 7200, loss[loss=0.1764, simple_loss=0.2762, pruned_loss=0.03834, over 12356.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.2514, pruned_loss=0.03552, over 2372900.52 frames. ], batch size: 38, lr: 3.08e-03, grad_scale: 16.0 2023-05-19 02:41:06,144 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=339654.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:41:37,341 INFO [finetune.py:992] (0/2) Epoch 20, batch 7250, loss[loss=0.1465, simple_loss=0.2304, pruned_loss=0.0313, over 12130.00 frames. ], tot_loss[loss=0.1613, simple_loss=0.2517, pruned_loss=0.03547, over 2372043.75 frames. ], batch size: 30, lr: 3.08e-03, grad_scale: 16.0 2023-05-19 02:41:44,198 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.880e+02 2.590e+02 2.948e+02 3.480e+02 6.149e+02, threshold=5.896e+02, percent-clipped=1.0 2023-05-19 02:41:48,330 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=339714.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 02:41:55,172 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=339724.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:42:11,777 INFO [finetune.py:992] (0/2) Epoch 20, batch 7300, loss[loss=0.1453, simple_loss=0.2415, pruned_loss=0.02458, over 12274.00 frames. ], tot_loss[loss=0.162, simple_loss=0.2524, pruned_loss=0.03578, over 2362317.36 frames. ], batch size: 33, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:42:12,589 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=339749.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:42:14,828 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=339752.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 02:42:28,575 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=339772.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:42:30,733 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=339775.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 02:42:30,867 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.20 vs. limit=2.0 2023-05-19 02:42:46,927 INFO [finetune.py:992] (0/2) Epoch 20, batch 7350, loss[loss=0.2254, simple_loss=0.3068, pruned_loss=0.07198, over 8078.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.252, pruned_loss=0.03572, over 2367611.99 frames. ], batch size: 98, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:42:54,211 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.075e+02 2.704e+02 3.207e+02 3.847e+02 6.086e+02, threshold=6.413e+02, percent-clipped=1.0 2023-05-19 02:42:57,748 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=339813.0, num_to_drop=1, layers_to_drop={3} 2023-05-19 02:43:11,885 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.5801, 3.0780, 3.7580, 4.6016, 3.9211, 4.7325, 4.0804, 3.5495], device='cuda:0'), covar=tensor([0.0059, 0.0356, 0.0169, 0.0055, 0.0172, 0.0085, 0.0139, 0.0352], device='cuda:0'), in_proj_covar=tensor([0.0095, 0.0127, 0.0108, 0.0086, 0.0110, 0.0122, 0.0108, 0.0142], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 02:43:13,827 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.4186, 5.2005, 4.7886, 4.6714, 5.3140, 4.5748, 4.8034, 4.6118], device='cuda:0'), covar=tensor([0.1820, 0.1133, 0.1436, 0.2162, 0.1075, 0.2311, 0.2142, 0.1448], device='cuda:0'), in_proj_covar=tensor([0.0377, 0.0530, 0.0422, 0.0468, 0.0489, 0.0463, 0.0421, 0.0405], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 02:43:22,079 INFO [finetune.py:992] (0/2) Epoch 20, batch 7400, loss[loss=0.1509, simple_loss=0.237, pruned_loss=0.03245, over 12292.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.2521, pruned_loss=0.03581, over 2370805.17 frames. ], batch size: 28, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:43:53,168 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=339893.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:43:56,581 INFO [finetune.py:992] (0/2) Epoch 20, batch 7450, loss[loss=0.167, simple_loss=0.2597, pruned_loss=0.03717, over 12026.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2524, pruned_loss=0.03611, over 2369569.06 frames. ], batch size: 40, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:44:03,619 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.916e+02 2.803e+02 3.200e+02 3.818e+02 1.775e+03, threshold=6.399e+02, percent-clipped=4.0 2023-05-19 02:44:19,931 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.6127, 5.1583, 5.5858, 4.9247, 5.2302, 5.0018, 5.6560, 5.2318], device='cuda:0'), covar=tensor([0.0280, 0.0374, 0.0271, 0.0248, 0.0403, 0.0292, 0.0184, 0.0222], device='cuda:0'), in_proj_covar=tensor([0.0291, 0.0288, 0.0316, 0.0285, 0.0287, 0.0286, 0.0262, 0.0235], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 02:44:32,247 INFO [finetune.py:992] (0/2) Epoch 20, batch 7500, loss[loss=0.1416, simple_loss=0.2203, pruned_loss=0.03147, over 11731.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.2519, pruned_loss=0.03601, over 2367009.42 frames. ], batch size: 26, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:44:36,505 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=339954.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:44:41,593 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.94 vs. limit=5.0 2023-05-19 02:44:43,943 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.1727, 5.0169, 5.0045, 5.0112, 4.6928, 5.0534, 5.1636, 5.2987], device='cuda:0'), covar=tensor([0.0225, 0.0198, 0.0195, 0.0420, 0.0730, 0.0458, 0.0155, 0.0177], device='cuda:0'), in_proj_covar=tensor([0.0212, 0.0211, 0.0205, 0.0264, 0.0253, 0.0238, 0.0192, 0.0250], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0005, 0.0003, 0.0005], device='cuda:0') 2023-05-19 02:45:07,512 INFO [finetune.py:992] (0/2) Epoch 20, batch 7550, loss[loss=0.1581, simple_loss=0.2525, pruned_loss=0.03183, over 12348.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2521, pruned_loss=0.03635, over 2365299.48 frames. ], batch size: 36, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:45:09,210 INFO [checkpoint.py:75] (0/2) Saving checkpoint to pruned_transducer_stateless7/exp_giga_finetune/checkpoint-240000.pt 2023-05-19 02:45:13,305 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=340002.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:45:17,374 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.907e+02 2.442e+02 2.906e+02 3.531e+02 6.907e+02, threshold=5.812e+02, percent-clipped=1.0 2023-05-19 02:45:33,038 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-05-19 02:45:43,103 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([6.1030, 6.0353, 5.7985, 5.3650, 5.2512, 5.9787, 5.6011, 5.2725], device='cuda:0'), covar=tensor([0.0633, 0.0927, 0.0707, 0.1872, 0.0658, 0.0729, 0.1515, 0.1101], device='cuda:0'), in_proj_covar=tensor([0.0663, 0.0599, 0.0556, 0.0674, 0.0451, 0.0778, 0.0824, 0.0593], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0004, 0.0004, 0.0002], device='cuda:0') 2023-05-19 02:45:45,072 INFO [finetune.py:992] (0/2) Epoch 20, batch 7600, loss[loss=0.1739, simple_loss=0.2671, pruned_loss=0.04034, over 12056.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.2525, pruned_loss=0.03657, over 2365611.29 frames. ], batch size: 37, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:45:45,939 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=340049.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:45:53,726 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.25 vs. limit=5.0 2023-05-19 02:46:00,977 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=340070.0, num_to_drop=1, layers_to_drop={3} 2023-05-19 02:46:19,705 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=340097.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:46:20,267 INFO [finetune.py:992] (0/2) Epoch 20, batch 7650, loss[loss=0.2193, simple_loss=0.2975, pruned_loss=0.07055, over 8305.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2527, pruned_loss=0.03665, over 2367582.93 frames. ], batch size: 98, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:46:27,853 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.934e+02 2.675e+02 2.965e+02 3.568e+02 6.455e+02, threshold=5.931e+02, percent-clipped=2.0 2023-05-19 02:46:27,928 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=340108.0, num_to_drop=1, layers_to_drop={3} 2023-05-19 02:46:40,600 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.4603, 4.8627, 3.2651, 2.8741, 4.1442, 2.8519, 4.1841, 3.4896], device='cuda:0'), covar=tensor([0.0730, 0.0523, 0.1044, 0.1564, 0.0312, 0.1272, 0.0455, 0.0778], device='cuda:0'), in_proj_covar=tensor([0.0193, 0.0268, 0.0182, 0.0206, 0.0150, 0.0188, 0.0206, 0.0181], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 02:46:55,330 INFO [finetune.py:992] (0/2) Epoch 20, batch 7700, loss[loss=0.1531, simple_loss=0.2468, pruned_loss=0.0297, over 12304.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2525, pruned_loss=0.03638, over 2369742.27 frames. ], batch size: 33, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:47:26,597 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=340193.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:47:29,958 INFO [finetune.py:992] (0/2) Epoch 20, batch 7750, loss[loss=0.1954, simple_loss=0.2895, pruned_loss=0.05066, over 12263.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2526, pruned_loss=0.03657, over 2373067.86 frames. ], batch size: 37, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:47:37,892 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.720e+02 2.635e+02 3.142e+02 3.673e+02 6.224e+02, threshold=6.284e+02, percent-clipped=2.0 2023-05-19 02:48:00,638 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=340241.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:48:05,992 INFO [finetune.py:992] (0/2) Epoch 20, batch 7800, loss[loss=0.1932, simple_loss=0.2824, pruned_loss=0.052, over 12038.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2526, pruned_loss=0.03644, over 2377637.77 frames. ], batch size: 42, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:48:15,231 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.1420, 2.5813, 3.5914, 3.0304, 3.4112, 3.2439, 2.6436, 3.5210], device='cuda:0'), covar=tensor([0.0144, 0.0395, 0.0180, 0.0265, 0.0195, 0.0223, 0.0409, 0.0164], device='cuda:0'), in_proj_covar=tensor([0.0198, 0.0217, 0.0208, 0.0203, 0.0236, 0.0181, 0.0212, 0.0208], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 02:48:28,550 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.87 vs. limit=2.0 2023-05-19 02:48:33,560 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.87 vs. limit=5.0 2023-05-19 02:48:40,431 INFO [finetune.py:992] (0/2) Epoch 20, batch 7850, loss[loss=0.1481, simple_loss=0.2394, pruned_loss=0.02846, over 12200.00 frames. ], tot_loss[loss=0.163, simple_loss=0.253, pruned_loss=0.03654, over 2369464.07 frames. ], batch size: 35, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:48:41,385 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.5231, 3.7678, 3.8703, 4.3567, 3.0584, 3.7070, 2.6430, 4.0374], device='cuda:0'), covar=tensor([0.1551, 0.1002, 0.1164, 0.0817, 0.1324, 0.0791, 0.1975, 0.1084], device='cuda:0'), in_proj_covar=tensor([0.0234, 0.0278, 0.0304, 0.0369, 0.0248, 0.0251, 0.0267, 0.0378], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-19 02:48:47,477 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.642e+02 2.941e+02 3.441e+02 7.528e+02, threshold=5.881e+02, percent-clipped=3.0 2023-05-19 02:48:52,264 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.80 vs. limit=2.0 2023-05-19 02:48:52,566 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.3637, 5.1318, 5.3763, 5.3556, 4.6386, 4.6841, 4.8518, 5.0532], device='cuda:0'), covar=tensor([0.1145, 0.1049, 0.1048, 0.0937, 0.3708, 0.2503, 0.0761, 0.1804], device='cuda:0'), in_proj_covar=tensor([0.0584, 0.0761, 0.0667, 0.0673, 0.0910, 0.0791, 0.0604, 0.0515], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0003, 0.0003], device='cuda:0') 2023-05-19 02:49:15,084 INFO [finetune.py:992] (0/2) Epoch 20, batch 7900, loss[loss=0.1899, simple_loss=0.2783, pruned_loss=0.05075, over 12043.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2525, pruned_loss=0.03638, over 2367307.41 frames. ], batch size: 42, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:49:31,155 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=340370.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 02:49:51,037 INFO [finetune.py:992] (0/2) Epoch 20, batch 7950, loss[loss=0.1332, simple_loss=0.214, pruned_loss=0.02619, over 11903.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2524, pruned_loss=0.03589, over 2378075.31 frames. ], batch size: 26, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:49:57,839 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.9294, 3.4896, 5.3459, 2.7953, 2.9460, 3.8440, 3.3294, 3.8423], device='cuda:0'), covar=tensor([0.0372, 0.1156, 0.0277, 0.1164, 0.2075, 0.1547, 0.1370, 0.1236], device='cuda:0'), in_proj_covar=tensor([0.0246, 0.0245, 0.0271, 0.0191, 0.0243, 0.0302, 0.0234, 0.0277], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 02:49:58,271 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.854e+02 2.568e+02 3.037e+02 3.883e+02 7.199e+02, threshold=6.075e+02, percent-clipped=3.0 2023-05-19 02:49:58,409 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=340408.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 02:50:05,390 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=340418.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 02:50:15,733 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.3574, 4.7572, 3.1808, 2.5787, 4.1366, 2.4813, 4.0231, 3.2958], device='cuda:0'), covar=tensor([0.0670, 0.0580, 0.1010, 0.1606, 0.0294, 0.1430, 0.0539, 0.0835], device='cuda:0'), in_proj_covar=tensor([0.0192, 0.0267, 0.0181, 0.0205, 0.0149, 0.0187, 0.0205, 0.0180], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 02:50:25,834 INFO [finetune.py:992] (0/2) Epoch 20, batch 8000, loss[loss=0.1551, simple_loss=0.2439, pruned_loss=0.03311, over 12261.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2524, pruned_loss=0.03639, over 2372293.78 frames. ], batch size: 32, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:50:29,618 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.4673, 2.5813, 3.5143, 4.4163, 3.8130, 4.4895, 3.9292, 3.2364], device='cuda:0'), covar=tensor([0.0053, 0.0451, 0.0190, 0.0055, 0.0155, 0.0081, 0.0185, 0.0375], device='cuda:0'), in_proj_covar=tensor([0.0095, 0.0128, 0.0108, 0.0086, 0.0110, 0.0123, 0.0107, 0.0142], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 02:50:31,586 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=340456.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 02:50:51,207 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.5354, 5.1109, 5.4194, 4.7663, 5.2570, 4.7674, 5.4962, 5.2558], device='cuda:0'), covar=tensor([0.0385, 0.0517, 0.0530, 0.0335, 0.0463, 0.0455, 0.0327, 0.0234], device='cuda:0'), in_proj_covar=tensor([0.0293, 0.0290, 0.0319, 0.0287, 0.0287, 0.0288, 0.0265, 0.0236], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 02:51:00,154 INFO [finetune.py:992] (0/2) Epoch 20, batch 8050, loss[loss=0.1516, simple_loss=0.2252, pruned_loss=0.03901, over 12266.00 frames. ], tot_loss[loss=0.164, simple_loss=0.2531, pruned_loss=0.03746, over 2359262.17 frames. ], batch size: 28, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:51:07,539 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.988e+02 2.771e+02 3.199e+02 3.982e+02 8.262e+02, threshold=6.397e+02, percent-clipped=3.0 2023-05-19 02:51:07,757 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.1005, 3.9402, 2.6488, 2.3513, 3.5433, 2.2978, 3.5822, 2.8617], device='cuda:0'), covar=tensor([0.0713, 0.0707, 0.1186, 0.1633, 0.0318, 0.1439, 0.0510, 0.0911], device='cuda:0'), in_proj_covar=tensor([0.0191, 0.0266, 0.0181, 0.0205, 0.0149, 0.0187, 0.0205, 0.0180], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 02:51:19,414 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=340525.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:51:35,847 INFO [finetune.py:992] (0/2) Epoch 20, batch 8100, loss[loss=0.1697, simple_loss=0.2629, pruned_loss=0.03829, over 12155.00 frames. ], tot_loss[loss=0.1651, simple_loss=0.2547, pruned_loss=0.03779, over 2356897.45 frames. ], batch size: 34, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:52:02,183 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=340586.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:52:10,379 INFO [finetune.py:992] (0/2) Epoch 20, batch 8150, loss[loss=0.1419, simple_loss=0.2292, pruned_loss=0.02731, over 12413.00 frames. ], tot_loss[loss=0.1655, simple_loss=0.255, pruned_loss=0.03794, over 2357958.09 frames. ], batch size: 32, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:52:17,437 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.856e+02 2.602e+02 3.226e+02 3.872e+02 5.642e+02, threshold=6.452e+02, percent-clipped=0.0 2023-05-19 02:52:23,743 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.3044, 6.1021, 5.6882, 5.5437, 6.1984, 5.3965, 5.5721, 5.5463], device='cuda:0'), covar=tensor([0.1533, 0.1040, 0.1111, 0.2151, 0.1020, 0.2383, 0.2244, 0.1235], device='cuda:0'), in_proj_covar=tensor([0.0380, 0.0533, 0.0426, 0.0474, 0.0491, 0.0469, 0.0427, 0.0409], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 02:52:45,730 INFO [finetune.py:992] (0/2) Epoch 20, batch 8200, loss[loss=0.168, simple_loss=0.2516, pruned_loss=0.04224, over 12019.00 frames. ], tot_loss[loss=0.165, simple_loss=0.2545, pruned_loss=0.03777, over 2367281.15 frames. ], batch size: 31, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:53:20,902 INFO [finetune.py:992] (0/2) Epoch 20, batch 8250, loss[loss=0.1683, simple_loss=0.2631, pruned_loss=0.03673, over 11583.00 frames. ], tot_loss[loss=0.1654, simple_loss=0.2551, pruned_loss=0.03788, over 2360958.17 frames. ], batch size: 48, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:53:27,650 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.809e+02 2.637e+02 3.079e+02 3.565e+02 9.552e+02, threshold=6.158e+02, percent-clipped=2.0 2023-05-19 02:53:55,563 INFO [finetune.py:992] (0/2) Epoch 20, batch 8300, loss[loss=0.175, simple_loss=0.2648, pruned_loss=0.04263, over 12120.00 frames. ], tot_loss[loss=0.1642, simple_loss=0.2537, pruned_loss=0.0373, over 2361433.61 frames. ], batch size: 39, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:54:17,243 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.4779, 5.2770, 5.3808, 5.4678, 5.0928, 5.1077, 4.8583, 5.3430], device='cuda:0'), covar=tensor([0.0711, 0.0602, 0.0851, 0.0628, 0.1742, 0.1452, 0.0621, 0.1209], device='cuda:0'), in_proj_covar=tensor([0.0583, 0.0764, 0.0669, 0.0677, 0.0911, 0.0796, 0.0606, 0.0517], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:0') 2023-05-19 02:54:19,699 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.44 vs. limit=2.0 2023-05-19 02:54:30,931 INFO [finetune.py:992] (0/2) Epoch 20, batch 8350, loss[loss=0.1644, simple_loss=0.2601, pruned_loss=0.03439, over 12045.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.253, pruned_loss=0.03671, over 2367049.56 frames. ], batch size: 40, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:54:38,242 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.799e+02 2.533e+02 3.136e+02 3.721e+02 8.819e+02, threshold=6.271e+02, percent-clipped=3.0 2023-05-19 02:55:06,367 INFO [finetune.py:992] (0/2) Epoch 20, batch 8400, loss[loss=0.2507, simple_loss=0.3278, pruned_loss=0.08683, over 8079.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2521, pruned_loss=0.03648, over 2369079.65 frames. ], batch size: 97, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:55:29,456 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=340881.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:55:41,232 INFO [finetune.py:992] (0/2) Epoch 20, batch 8450, loss[loss=0.1532, simple_loss=0.2319, pruned_loss=0.03721, over 11819.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.2524, pruned_loss=0.03658, over 2365916.47 frames. ], batch size: 26, lr: 3.07e-03, grad_scale: 8.0 2023-05-19 02:55:48,749 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.776e+02 2.491e+02 2.887e+02 3.519e+02 8.835e+02, threshold=5.775e+02, percent-clipped=2.0 2023-05-19 02:55:56,513 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.3176, 5.1222, 5.2853, 5.3159, 4.9174, 4.9831, 4.7447, 5.1579], device='cuda:0'), covar=tensor([0.0698, 0.0575, 0.0911, 0.0574, 0.1837, 0.1347, 0.0604, 0.1266], device='cuda:0'), in_proj_covar=tensor([0.0584, 0.0767, 0.0670, 0.0678, 0.0913, 0.0796, 0.0607, 0.0518], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:0') 2023-05-19 02:56:03,726 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.39 vs. limit=2.0 2023-05-19 02:56:15,956 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.91 vs. limit=2.0 2023-05-19 02:56:16,150 INFO [finetune.py:992] (0/2) Epoch 20, batch 8500, loss[loss=0.1703, simple_loss=0.2562, pruned_loss=0.0422, over 12024.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.2523, pruned_loss=0.03662, over 2363496.80 frames. ], batch size: 31, lr: 3.07e-03, grad_scale: 8.0 2023-05-19 02:56:33,469 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.7402, 3.7456, 3.3400, 3.2025, 2.9311, 2.7578, 3.7683, 2.4739], device='cuda:0'), covar=tensor([0.0422, 0.0145, 0.0229, 0.0258, 0.0480, 0.0474, 0.0141, 0.0578], device='cuda:0'), in_proj_covar=tensor([0.0202, 0.0173, 0.0180, 0.0203, 0.0211, 0.0209, 0.0184, 0.0214], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 02:56:51,252 INFO [finetune.py:992] (0/2) Epoch 20, batch 8550, loss[loss=0.1426, simple_loss=0.2335, pruned_loss=0.02587, over 12278.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2526, pruned_loss=0.03657, over 2368133.78 frames. ], batch size: 33, lr: 3.07e-03, grad_scale: 8.0 2023-05-19 02:56:58,190 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.60 vs. limit=2.0 2023-05-19 02:56:59,231 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.006e+02 2.772e+02 3.141e+02 3.674e+02 1.798e+03, threshold=6.281e+02, percent-clipped=4.0 2023-05-19 02:57:01,580 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.1624, 4.6552, 4.0503, 4.9085, 4.5052, 2.9875, 4.1473, 2.8859], device='cuda:0'), covar=tensor([0.0892, 0.0815, 0.1541, 0.0563, 0.1186, 0.1738, 0.1134, 0.4033], device='cuda:0'), in_proj_covar=tensor([0.0321, 0.0391, 0.0373, 0.0354, 0.0380, 0.0283, 0.0359, 0.0375], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 02:57:21,744 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.38 vs. limit=2.0 2023-05-19 02:57:26,111 INFO [finetune.py:992] (0/2) Epoch 20, batch 8600, loss[loss=0.1534, simple_loss=0.2474, pruned_loss=0.02968, over 12180.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2526, pruned_loss=0.03626, over 2367531.68 frames. ], batch size: 35, lr: 3.07e-03, grad_scale: 8.0 2023-05-19 02:58:01,376 INFO [finetune.py:992] (0/2) Epoch 20, batch 8650, loss[loss=0.1534, simple_loss=0.2451, pruned_loss=0.03083, over 12256.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2523, pruned_loss=0.03606, over 2365618.55 frames. ], batch size: 32, lr: 3.07e-03, grad_scale: 8.0 2023-05-19 02:58:09,055 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.034e+02 2.571e+02 3.077e+02 3.507e+02 6.175e+02, threshold=6.155e+02, percent-clipped=0.0 2023-05-19 02:58:09,827 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.7158, 5.4455, 5.0071, 4.9076, 5.5819, 4.7477, 4.9771, 4.9304], device='cuda:0'), covar=tensor([0.1693, 0.1154, 0.1730, 0.2422, 0.1026, 0.2590, 0.2283, 0.1383], device='cuda:0'), in_proj_covar=tensor([0.0383, 0.0534, 0.0429, 0.0477, 0.0493, 0.0472, 0.0426, 0.0410], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 02:58:27,663 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.0655, 4.6337, 4.0979, 4.8221, 4.4658, 2.8837, 4.1509, 2.9542], device='cuda:0'), covar=tensor([0.1000, 0.0823, 0.1522, 0.0693, 0.1251, 0.1822, 0.1186, 0.3572], device='cuda:0'), in_proj_covar=tensor([0.0320, 0.0390, 0.0372, 0.0354, 0.0380, 0.0283, 0.0358, 0.0374], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 02:58:32,907 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-05-19 02:58:36,609 INFO [finetune.py:992] (0/2) Epoch 20, batch 8700, loss[loss=0.1612, simple_loss=0.2532, pruned_loss=0.03457, over 12300.00 frames. ], tot_loss[loss=0.162, simple_loss=0.2521, pruned_loss=0.03599, over 2366164.39 frames. ], batch size: 34, lr: 3.07e-03, grad_scale: 8.0 2023-05-19 02:58:46,434 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0627, 4.7201, 4.7736, 4.9903, 4.8176, 5.0121, 4.7717, 2.5940], device='cuda:0'), covar=tensor([0.0110, 0.0085, 0.0106, 0.0057, 0.0055, 0.0101, 0.0109, 0.0928], device='cuda:0'), in_proj_covar=tensor([0.0074, 0.0086, 0.0089, 0.0078, 0.0065, 0.0100, 0.0086, 0.0103], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 02:58:59,443 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=341181.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:59:11,288 INFO [finetune.py:992] (0/2) Epoch 20, batch 8750, loss[loss=0.1402, simple_loss=0.2252, pruned_loss=0.02757, over 12286.00 frames. ], tot_loss[loss=0.1607, simple_loss=0.2508, pruned_loss=0.03535, over 2378421.66 frames. ], batch size: 28, lr: 3.07e-03, grad_scale: 8.0 2023-05-19 02:59:19,076 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.539e+02 3.018e+02 3.610e+02 6.559e+02, threshold=6.037e+02, percent-clipped=4.0 2023-05-19 02:59:33,508 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=341229.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:59:46,897 INFO [finetune.py:992] (0/2) Epoch 20, batch 8800, loss[loss=0.1424, simple_loss=0.2331, pruned_loss=0.02584, over 12348.00 frames. ], tot_loss[loss=0.161, simple_loss=0.2515, pruned_loss=0.03529, over 2381010.44 frames. ], batch size: 30, lr: 3.07e-03, grad_scale: 8.0 2023-05-19 03:00:21,935 INFO [finetune.py:992] (0/2) Epoch 20, batch 8850, loss[loss=0.2376, simple_loss=0.3189, pruned_loss=0.07809, over 7517.00 frames. ], tot_loss[loss=0.1609, simple_loss=0.2514, pruned_loss=0.03526, over 2379516.31 frames. ], batch size: 97, lr: 3.07e-03, grad_scale: 8.0 2023-05-19 03:00:29,541 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.883e+02 2.712e+02 3.085e+02 3.586e+02 5.651e+02, threshold=6.171e+02, percent-clipped=0.0 2023-05-19 03:00:34,127 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.20 vs. limit=2.0 2023-05-19 03:00:56,633 INFO [finetune.py:992] (0/2) Epoch 20, batch 8900, loss[loss=0.1456, simple_loss=0.2382, pruned_loss=0.02646, over 12075.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2525, pruned_loss=0.03586, over 2377369.12 frames. ], batch size: 32, lr: 3.07e-03, grad_scale: 8.0 2023-05-19 03:01:08,659 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.26 vs. limit=2.0 2023-05-19 03:01:25,091 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.7588, 3.2763, 5.1949, 2.7114, 2.8581, 3.8439, 3.0359, 3.8285], device='cuda:0'), covar=tensor([0.0409, 0.1290, 0.0278, 0.1167, 0.1958, 0.1535, 0.1622, 0.1154], device='cuda:0'), in_proj_covar=tensor([0.0246, 0.0245, 0.0269, 0.0190, 0.0242, 0.0300, 0.0233, 0.0276], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 03:01:31,957 INFO [finetune.py:992] (0/2) Epoch 20, batch 8950, loss[loss=0.1546, simple_loss=0.2429, pruned_loss=0.03317, over 12258.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.2523, pruned_loss=0.03568, over 2380739.47 frames. ], batch size: 32, lr: 3.07e-03, grad_scale: 8.0 2023-05-19 03:01:39,782 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.868e+02 2.654e+02 3.068e+02 3.706e+02 5.629e+02, threshold=6.135e+02, percent-clipped=0.0 2023-05-19 03:01:55,824 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.6133, 5.4203, 5.5348, 5.5746, 5.1867, 5.2656, 4.9828, 5.4978], device='cuda:0'), covar=tensor([0.0736, 0.0611, 0.0847, 0.0605, 0.1890, 0.1335, 0.0590, 0.1162], device='cuda:0'), in_proj_covar=tensor([0.0588, 0.0771, 0.0673, 0.0684, 0.0914, 0.0799, 0.0611, 0.0517], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:0') 2023-05-19 03:02:07,757 INFO [finetune.py:992] (0/2) Epoch 20, batch 9000, loss[loss=0.1691, simple_loss=0.2676, pruned_loss=0.03529, over 11822.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.2521, pruned_loss=0.03561, over 2380747.69 frames. ], batch size: 44, lr: 3.07e-03, grad_scale: 8.0 2023-05-19 03:02:07,758 INFO [finetune.py:1017] (0/2) Computing validation loss 2023-05-19 03:02:19,174 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.2217, 5.9367, 5.5964, 5.4954, 6.0561, 5.2649, 5.3174, 5.4193], device='cuda:0'), covar=tensor([0.1411, 0.0816, 0.0763, 0.1438, 0.0611, 0.2053, 0.2147, 0.1068], device='cuda:0'), in_proj_covar=tensor([0.0382, 0.0536, 0.0426, 0.0476, 0.0490, 0.0471, 0.0424, 0.0410], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 03:02:23,318 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.6177, 2.9502, 2.2925, 2.1314, 2.6617, 2.1724, 2.8886, 2.4284], device='cuda:0'), covar=tensor([0.0728, 0.0765, 0.1084, 0.1613, 0.0327, 0.1364, 0.0585, 0.0917], device='cuda:0'), in_proj_covar=tensor([0.0192, 0.0265, 0.0181, 0.0204, 0.0148, 0.0186, 0.0204, 0.0180], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 03:02:25,417 INFO [finetune.py:1026] (0/2) Epoch 20, validation: loss=0.3184, simple_loss=0.392, pruned_loss=0.1224, over 1020973.00 frames. 2023-05-19 03:02:25,418 INFO [finetune.py:1027] (0/2) Maximum memory allocated so far is 12525MB 2023-05-19 03:03:00,707 INFO [finetune.py:992] (0/2) Epoch 20, batch 9050, loss[loss=0.1364, simple_loss=0.222, pruned_loss=0.02542, over 11989.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2517, pruned_loss=0.03555, over 2373836.20 frames. ], batch size: 28, lr: 3.07e-03, grad_scale: 8.0 2023-05-19 03:03:08,444 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.553e+02 3.007e+02 3.703e+02 8.660e+02, threshold=6.014e+02, percent-clipped=2.0 2023-05-19 03:03:28,021 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=2.97 vs. limit=5.0 2023-05-19 03:03:36,064 INFO [finetune.py:992] (0/2) Epoch 20, batch 9100, loss[loss=0.1776, simple_loss=0.2747, pruned_loss=0.0403, over 11789.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2526, pruned_loss=0.03631, over 2357739.16 frames. ], batch size: 44, lr: 3.07e-03, grad_scale: 8.0 2023-05-19 03:03:41,864 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.2584, 4.6249, 2.9673, 2.7044, 4.0009, 2.6728, 3.9630, 3.1256], device='cuda:0'), covar=tensor([0.0735, 0.0626, 0.1108, 0.1441, 0.0291, 0.1331, 0.0495, 0.0868], device='cuda:0'), in_proj_covar=tensor([0.0191, 0.0264, 0.0180, 0.0203, 0.0148, 0.0186, 0.0203, 0.0179], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 03:03:51,617 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.0282, 4.6156, 4.0208, 4.8532, 4.3601, 2.9557, 4.1233, 3.0343], device='cuda:0'), covar=tensor([0.0960, 0.0806, 0.1496, 0.0554, 0.1264, 0.1883, 0.1147, 0.3423], device='cuda:0'), in_proj_covar=tensor([0.0321, 0.0392, 0.0374, 0.0355, 0.0382, 0.0284, 0.0360, 0.0376], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 03:03:59,904 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.23 vs. limit=2.0 2023-05-19 03:04:10,627 INFO [finetune.py:992] (0/2) Epoch 20, batch 9150, loss[loss=0.1402, simple_loss=0.2259, pruned_loss=0.02728, over 12191.00 frames. ], tot_loss[loss=0.1633, simple_loss=0.2535, pruned_loss=0.03651, over 2361593.88 frames. ], batch size: 29, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:04:18,557 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.737e+02 2.573e+02 3.080e+02 3.663e+02 9.571e+02, threshold=6.160e+02, percent-clipped=3.0 2023-05-19 03:04:46,443 INFO [finetune.py:992] (0/2) Epoch 20, batch 9200, loss[loss=0.1213, simple_loss=0.2109, pruned_loss=0.01585, over 11968.00 frames. ], tot_loss[loss=0.162, simple_loss=0.2522, pruned_loss=0.03589, over 2366943.22 frames. ], batch size: 28, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:05:20,562 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.38 vs. limit=2.0 2023-05-19 03:05:21,427 INFO [finetune.py:992] (0/2) Epoch 20, batch 9250, loss[loss=0.1402, simple_loss=0.2346, pruned_loss=0.02289, over 12187.00 frames. ], tot_loss[loss=0.1635, simple_loss=0.2538, pruned_loss=0.03661, over 2365187.20 frames. ], batch size: 31, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:05:28,933 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.941e+02 2.659e+02 3.200e+02 3.746e+02 7.852e+02, threshold=6.401e+02, percent-clipped=2.0 2023-05-19 03:05:56,113 INFO [finetune.py:992] (0/2) Epoch 20, batch 9300, loss[loss=0.1755, simple_loss=0.2678, pruned_loss=0.04153, over 11791.00 frames. ], tot_loss[loss=0.164, simple_loss=0.2542, pruned_loss=0.03688, over 2356774.10 frames. ], batch size: 44, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:06:29,294 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2023-05-19 03:06:31,475 INFO [finetune.py:992] (0/2) Epoch 20, batch 9350, loss[loss=0.1592, simple_loss=0.2547, pruned_loss=0.03184, over 12338.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.2533, pruned_loss=0.03646, over 2360662.31 frames. ], batch size: 36, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:06:39,946 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.960e+02 2.662e+02 3.184e+02 3.616e+02 6.301e+02, threshold=6.367e+02, percent-clipped=0.0 2023-05-19 03:06:46,545 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.0469, 4.6975, 4.6913, 4.8630, 4.7699, 4.9597, 4.7599, 2.5315], device='cuda:0'), covar=tensor([0.0096, 0.0090, 0.0131, 0.0071, 0.0060, 0.0112, 0.0100, 0.1029], device='cuda:0'), in_proj_covar=tensor([0.0074, 0.0086, 0.0089, 0.0079, 0.0065, 0.0100, 0.0087, 0.0104], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 03:07:02,498 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=341841.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:07:07,285 INFO [finetune.py:992] (0/2) Epoch 20, batch 9400, loss[loss=0.1394, simple_loss=0.2183, pruned_loss=0.03029, over 12171.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.2521, pruned_loss=0.03567, over 2370051.34 frames. ], batch size: 29, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:07:42,737 INFO [finetune.py:992] (0/2) Epoch 20, batch 9450, loss[loss=0.1697, simple_loss=0.2611, pruned_loss=0.03915, over 12079.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2528, pruned_loss=0.03592, over 2360448.42 frames. ], batch size: 32, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:07:45,770 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=341902.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:07:50,489 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.800e+02 2.708e+02 3.154e+02 3.636e+02 6.774e+02, threshold=6.308e+02, percent-clipped=1.0 2023-05-19 03:08:02,661 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.18 vs. limit=2.0 2023-05-19 03:08:06,226 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-05-19 03:08:17,044 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.2632, 6.1937, 5.6395, 5.7035, 6.2463, 5.4782, 5.6298, 5.7379], device='cuda:0'), covar=tensor([0.1368, 0.0857, 0.1072, 0.1819, 0.0873, 0.2302, 0.1834, 0.1082], device='cuda:0'), in_proj_covar=tensor([0.0382, 0.0533, 0.0428, 0.0474, 0.0493, 0.0473, 0.0428, 0.0412], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 03:08:17,671 INFO [finetune.py:992] (0/2) Epoch 20, batch 9500, loss[loss=0.1621, simple_loss=0.2527, pruned_loss=0.03574, over 12271.00 frames. ], tot_loss[loss=0.1613, simple_loss=0.2517, pruned_loss=0.03547, over 2363623.16 frames. ], batch size: 28, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:08:46,080 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=341988.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:08:46,733 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.1970, 5.0397, 5.1799, 5.1956, 4.8140, 4.9062, 4.6266, 5.1427], device='cuda:0'), covar=tensor([0.0846, 0.0656, 0.0970, 0.0701, 0.2123, 0.1354, 0.0626, 0.1124], device='cuda:0'), in_proj_covar=tensor([0.0589, 0.0774, 0.0673, 0.0687, 0.0921, 0.0802, 0.0616, 0.0520], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:0') 2023-05-19 03:08:52,731 INFO [finetune.py:992] (0/2) Epoch 20, batch 9550, loss[loss=0.1662, simple_loss=0.2573, pruned_loss=0.03756, over 12271.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.2523, pruned_loss=0.03552, over 2365595.97 frames. ], batch size: 37, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:08:54,332 INFO [checkpoint.py:75] (0/2) Saving checkpoint to pruned_transducer_stateless7/exp_giga_finetune/checkpoint-242000.pt 2023-05-19 03:08:59,203 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.6058, 4.9993, 3.2528, 2.8285, 4.3424, 2.9001, 4.1980, 3.5905], device='cuda:0'), covar=tensor([0.0693, 0.0472, 0.1117, 0.1496, 0.0283, 0.1286, 0.0482, 0.0777], device='cuda:0'), in_proj_covar=tensor([0.0192, 0.0267, 0.0182, 0.0205, 0.0149, 0.0187, 0.0204, 0.0180], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 03:09:03,123 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.737e+02 2.794e+02 3.289e+02 4.342e+02 9.721e+02, threshold=6.577e+02, percent-clipped=4.0 2023-05-19 03:09:18,757 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.8377, 3.5059, 5.2983, 2.7989, 3.1285, 4.0793, 3.3735, 3.8800], device='cuda:0'), covar=tensor([0.0475, 0.1223, 0.0249, 0.1142, 0.1786, 0.1319, 0.1389, 0.1295], device='cuda:0'), in_proj_covar=tensor([0.0246, 0.0244, 0.0270, 0.0191, 0.0242, 0.0300, 0.0233, 0.0277], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 03:09:30,915 INFO [finetune.py:992] (0/2) Epoch 20, batch 9600, loss[loss=0.1434, simple_loss=0.2329, pruned_loss=0.02691, over 12353.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.252, pruned_loss=0.03558, over 2357743.21 frames. ], batch size: 31, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:09:31,836 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=342049.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:09:32,526 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.0021, 3.8226, 3.4792, 3.3771, 3.1454, 3.0111, 3.8571, 2.6519], device='cuda:0'), covar=tensor([0.0358, 0.0165, 0.0200, 0.0235, 0.0370, 0.0391, 0.0137, 0.0458], device='cuda:0'), in_proj_covar=tensor([0.0201, 0.0171, 0.0178, 0.0203, 0.0208, 0.0206, 0.0183, 0.0212], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 03:10:06,484 INFO [finetune.py:992] (0/2) Epoch 20, batch 9650, loss[loss=0.1677, simple_loss=0.2632, pruned_loss=0.03608, over 12040.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.2521, pruned_loss=0.03511, over 2360293.68 frames. ], batch size: 42, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:10:14,313 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.471e+02 2.894e+02 3.503e+02 1.028e+03, threshold=5.788e+02, percent-clipped=2.0 2023-05-19 03:10:41,424 INFO [finetune.py:992] (0/2) Epoch 20, batch 9700, loss[loss=0.1554, simple_loss=0.2486, pruned_loss=0.03113, over 12189.00 frames. ], tot_loss[loss=0.1613, simple_loss=0.252, pruned_loss=0.03528, over 2365152.18 frames. ], batch size: 35, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:10:44,704 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=2.00 vs. limit=2.0 2023-05-19 03:10:53,565 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.2483, 2.5284, 3.6852, 3.1390, 3.6000, 3.2483, 2.5857, 3.6031], device='cuda:0'), covar=tensor([0.0160, 0.0463, 0.0210, 0.0311, 0.0178, 0.0249, 0.0448, 0.0154], device='cuda:0'), in_proj_covar=tensor([0.0196, 0.0217, 0.0207, 0.0202, 0.0236, 0.0181, 0.0211, 0.0209], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 03:11:13,764 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=2.57 vs. limit=5.0 2023-05-19 03:11:16,293 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=342197.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:11:16,898 INFO [finetune.py:992] (0/2) Epoch 20, batch 9750, loss[loss=0.1644, simple_loss=0.2593, pruned_loss=0.03475, over 12149.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2522, pruned_loss=0.03526, over 2364714.12 frames. ], batch size: 36, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:11:24,913 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.913e+02 2.576e+02 3.145e+02 4.023e+02 1.000e+03, threshold=6.290e+02, percent-clipped=4.0 2023-05-19 03:11:52,284 INFO [finetune.py:992] (0/2) Epoch 20, batch 9800, loss[loss=0.1923, simple_loss=0.2853, pruned_loss=0.04962, over 12268.00 frames. ], tot_loss[loss=0.1613, simple_loss=0.2522, pruned_loss=0.03517, over 2373061.14 frames. ], batch size: 37, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:12:13,923 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.5588, 5.4195, 5.5082, 5.5666, 5.1507, 5.2049, 4.9831, 5.5121], device='cuda:0'), covar=tensor([0.0797, 0.0614, 0.0768, 0.0561, 0.1931, 0.1274, 0.0612, 0.0960], device='cuda:0'), in_proj_covar=tensor([0.0589, 0.0778, 0.0672, 0.0687, 0.0928, 0.0806, 0.0618, 0.0521], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:0') 2023-05-19 03:12:14,722 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.7959, 4.0811, 3.5508, 4.2448, 3.8125, 2.7622, 3.6502, 2.8688], device='cuda:0'), covar=tensor([0.0862, 0.0891, 0.1547, 0.0706, 0.1428, 0.1767, 0.1261, 0.3285], device='cuda:0'), in_proj_covar=tensor([0.0320, 0.0390, 0.0373, 0.0356, 0.0383, 0.0284, 0.0359, 0.0375], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 03:12:15,615 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-05-19 03:12:27,137 INFO [finetune.py:992] (0/2) Epoch 20, batch 9850, loss[loss=0.2124, simple_loss=0.2926, pruned_loss=0.06606, over 8709.00 frames. ], tot_loss[loss=0.1613, simple_loss=0.2521, pruned_loss=0.03527, over 2370889.77 frames. ], batch size: 98, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:12:35,019 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.707e+02 2.626e+02 3.190e+02 3.767e+02 5.827e+02, threshold=6.380e+02, percent-clipped=0.0 2023-05-19 03:12:36,487 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=342311.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 03:12:43,292 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.26 vs. limit=2.0 2023-05-19 03:12:49,396 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.72 vs. limit=2.0 2023-05-19 03:13:00,199 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=342344.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:13:02,725 INFO [finetune.py:992] (0/2) Epoch 20, batch 9900, loss[loss=0.1993, simple_loss=0.2829, pruned_loss=0.05787, over 12138.00 frames. ], tot_loss[loss=0.1606, simple_loss=0.2515, pruned_loss=0.03492, over 2373268.57 frames. ], batch size: 34, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:13:06,376 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.3786, 2.2916, 3.5378, 4.2525, 3.7455, 4.2804, 3.8719, 3.1595], device='cuda:0'), covar=tensor([0.0053, 0.0494, 0.0165, 0.0074, 0.0155, 0.0096, 0.0140, 0.0385], device='cuda:0'), in_proj_covar=tensor([0.0094, 0.0126, 0.0108, 0.0086, 0.0110, 0.0121, 0.0108, 0.0141], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 03:13:08,041 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.96 vs. limit=2.0 2023-05-19 03:13:19,546 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=342372.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 03:13:38,033 INFO [finetune.py:992] (0/2) Epoch 20, batch 9950, loss[loss=0.1731, simple_loss=0.26, pruned_loss=0.04311, over 12315.00 frames. ], tot_loss[loss=0.1609, simple_loss=0.2515, pruned_loss=0.03519, over 2380335.58 frames. ], batch size: 34, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:13:46,013 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.997e+02 2.702e+02 3.172e+02 3.899e+02 3.372e+03, threshold=6.344e+02, percent-clipped=3.0 2023-05-19 03:14:02,568 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.1568, 3.9462, 2.5627, 2.4243, 3.4786, 2.3972, 3.5464, 2.9016], device='cuda:0'), covar=tensor([0.0679, 0.0749, 0.1315, 0.1590, 0.0400, 0.1363, 0.0618, 0.0823], device='cuda:0'), in_proj_covar=tensor([0.0191, 0.0265, 0.0180, 0.0202, 0.0148, 0.0185, 0.0202, 0.0178], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 03:14:03,722 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.79 vs. limit=5.0 2023-05-19 03:14:07,911 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.6941, 4.6715, 4.5094, 4.1898, 4.2632, 4.6546, 4.3876, 4.1279], device='cuda:0'), covar=tensor([0.0860, 0.0945, 0.0777, 0.1561, 0.2074, 0.0870, 0.1390, 0.1142], device='cuda:0'), in_proj_covar=tensor([0.0668, 0.0605, 0.0562, 0.0684, 0.0458, 0.0790, 0.0838, 0.0601], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0004, 0.0004, 0.0002], device='cuda:0') 2023-05-19 03:14:12,605 INFO [finetune.py:992] (0/2) Epoch 20, batch 10000, loss[loss=0.2271, simple_loss=0.3129, pruned_loss=0.07063, over 7992.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.2513, pruned_loss=0.03555, over 2372659.61 frames. ], batch size: 98, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:14:47,353 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=342497.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:14:47,884 INFO [finetune.py:992] (0/2) Epoch 20, batch 10050, loss[loss=0.1525, simple_loss=0.2392, pruned_loss=0.03285, over 12076.00 frames. ], tot_loss[loss=0.1609, simple_loss=0.251, pruned_loss=0.03539, over 2380369.83 frames. ], batch size: 32, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:14:55,629 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.947e+02 2.629e+02 3.145e+02 3.868e+02 7.423e+02, threshold=6.290e+02, percent-clipped=1.0 2023-05-19 03:14:57,876 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.9155, 4.5593, 4.6868, 4.7666, 4.6610, 4.8137, 4.5839, 2.6431], device='cuda:0'), covar=tensor([0.0081, 0.0072, 0.0091, 0.0063, 0.0053, 0.0099, 0.0081, 0.0829], device='cuda:0'), in_proj_covar=tensor([0.0075, 0.0087, 0.0090, 0.0079, 0.0066, 0.0101, 0.0088, 0.0105], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 03:14:57,951 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.4552, 3.1622, 4.9283, 2.6776, 2.8393, 3.5979, 3.1087, 3.6565], device='cuda:0'), covar=tensor([0.0555, 0.1289, 0.0403, 0.1198, 0.1947, 0.1578, 0.1424, 0.1331], device='cuda:0'), in_proj_covar=tensor([0.0247, 0.0245, 0.0272, 0.0192, 0.0244, 0.0302, 0.0234, 0.0279], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 03:15:09,651 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.4253, 3.5670, 3.2431, 3.7673, 3.4627, 2.6417, 3.2509, 2.8756], device='cuda:0'), covar=tensor([0.0986, 0.1274, 0.1731, 0.0782, 0.1482, 0.1899, 0.1449, 0.3017], device='cuda:0'), in_proj_covar=tensor([0.0322, 0.0394, 0.0374, 0.0358, 0.0385, 0.0286, 0.0362, 0.0377], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 03:15:11,084 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.32 vs. limit=2.0 2023-05-19 03:15:14,354 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=342535.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:15:17,655 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=342540.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:15:21,109 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=342545.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:15:23,109 INFO [finetune.py:992] (0/2) Epoch 20, batch 10100, loss[loss=0.1696, simple_loss=0.2605, pruned_loss=0.0394, over 12363.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.252, pruned_loss=0.03592, over 2368733.41 frames. ], batch size: 38, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:15:56,362 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=342596.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:15:57,531 INFO [finetune.py:992] (0/2) Epoch 20, batch 10150, loss[loss=0.1863, simple_loss=0.2725, pruned_loss=0.05007, over 10831.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.2524, pruned_loss=0.0357, over 2370783.66 frames. ], batch size: 68, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:16:00,059 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=342601.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:16:05,442 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.918e+02 2.782e+02 3.170e+02 3.709e+02 6.976e+02, threshold=6.340e+02, percent-clipped=1.0 2023-05-19 03:16:30,182 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=342644.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:16:32,841 INFO [finetune.py:992] (0/2) Epoch 20, batch 10200, loss[loss=0.1454, simple_loss=0.2339, pruned_loss=0.02845, over 12172.00 frames. ], tot_loss[loss=0.1608, simple_loss=0.251, pruned_loss=0.03529, over 2373796.20 frames. ], batch size: 31, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:16:46,097 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=342667.0, num_to_drop=1, layers_to_drop={3} 2023-05-19 03:17:04,232 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=342692.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:17:05,067 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=342693.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:17:08,504 INFO [finetune.py:992] (0/2) Epoch 20, batch 10250, loss[loss=0.144, simple_loss=0.2288, pruned_loss=0.02957, over 12127.00 frames. ], tot_loss[loss=0.1602, simple_loss=0.2506, pruned_loss=0.03491, over 2377214.15 frames. ], batch size: 30, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:17:09,340 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.1536, 3.9206, 4.1075, 3.8056, 3.9726, 3.8233, 4.1355, 3.6696], device='cuda:0'), covar=tensor([0.0355, 0.0403, 0.0351, 0.0264, 0.0381, 0.0338, 0.0256, 0.1398], device='cuda:0'), in_proj_covar=tensor([0.0293, 0.0293, 0.0320, 0.0289, 0.0287, 0.0289, 0.0266, 0.0239], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 03:17:16,064 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.825e+02 2.540e+02 2.941e+02 3.617e+02 1.365e+03, threshold=5.881e+02, percent-clipped=2.0 2023-05-19 03:17:43,263 INFO [finetune.py:992] (0/2) Epoch 20, batch 10300, loss[loss=0.1813, simple_loss=0.271, pruned_loss=0.04576, over 11763.00 frames. ], tot_loss[loss=0.161, simple_loss=0.2513, pruned_loss=0.03539, over 2380017.90 frames. ], batch size: 48, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:17:47,645 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=342754.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:18:08,757 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.34 vs. limit=2.0 2023-05-19 03:18:10,492 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.4912, 5.0001, 5.4593, 4.7735, 5.1334, 4.8458, 5.5038, 5.1333], device='cuda:0'), covar=tensor([0.0271, 0.0391, 0.0259, 0.0261, 0.0394, 0.0355, 0.0193, 0.0336], device='cuda:0'), in_proj_covar=tensor([0.0293, 0.0292, 0.0320, 0.0288, 0.0286, 0.0288, 0.0266, 0.0239], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 03:18:18,770 INFO [finetune.py:992] (0/2) Epoch 20, batch 10350, loss[loss=0.1584, simple_loss=0.2521, pruned_loss=0.03233, over 12364.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.2517, pruned_loss=0.03572, over 2376974.11 frames. ], batch size: 36, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:18:26,864 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.836e+02 3.182e+02 3.621e+02 7.914e+02, threshold=6.363e+02, percent-clipped=3.0 2023-05-19 03:18:29,809 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=342813.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 03:18:36,335 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2023-05-19 03:18:54,497 INFO [finetune.py:992] (0/2) Epoch 20, batch 10400, loss[loss=0.1906, simple_loss=0.2797, pruned_loss=0.05075, over 11745.00 frames. ], tot_loss[loss=0.1613, simple_loss=0.2515, pruned_loss=0.0356, over 2378914.64 frames. ], batch size: 48, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:19:12,810 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=342874.0, num_to_drop=1, layers_to_drop={2} 2023-05-19 03:19:21,101 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.1103, 5.9338, 5.5351, 5.4655, 6.0834, 5.3189, 5.5216, 5.4725], device='cuda:0'), covar=tensor([0.1513, 0.0912, 0.1054, 0.1937, 0.0860, 0.1925, 0.1852, 0.1157], device='cuda:0'), in_proj_covar=tensor([0.0381, 0.0533, 0.0427, 0.0474, 0.0490, 0.0470, 0.0426, 0.0409], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 03:19:24,557 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=342891.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:19:28,036 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=342896.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:19:29,313 INFO [finetune.py:992] (0/2) Epoch 20, batch 10450, loss[loss=0.1604, simple_loss=0.2436, pruned_loss=0.03866, over 12296.00 frames. ], tot_loss[loss=0.1607, simple_loss=0.2509, pruned_loss=0.03522, over 2379181.78 frames. ], batch size: 33, lr: 3.06e-03, grad_scale: 16.0 2023-05-19 03:19:36,972 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.491e+02 2.496e+02 2.961e+02 3.391e+02 1.096e+03, threshold=5.923e+02, percent-clipped=0.0 2023-05-19 03:20:04,788 INFO [finetune.py:992] (0/2) Epoch 20, batch 10500, loss[loss=0.1809, simple_loss=0.2721, pruned_loss=0.04489, over 10542.00 frames. ], tot_loss[loss=0.1601, simple_loss=0.2502, pruned_loss=0.03501, over 2378871.85 frames. ], batch size: 68, lr: 3.06e-03, grad_scale: 16.0 2023-05-19 03:20:17,924 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=342967.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 03:20:39,815 INFO [finetune.py:992] (0/2) Epoch 20, batch 10550, loss[loss=0.2303, simple_loss=0.3089, pruned_loss=0.0759, over 7809.00 frames. ], tot_loss[loss=0.1613, simple_loss=0.2516, pruned_loss=0.03552, over 2373850.23 frames. ], batch size: 99, lr: 3.06e-03, grad_scale: 16.0 2023-05-19 03:20:47,698 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.586e+02 3.115e+02 3.687e+02 6.303e+02, threshold=6.229e+02, percent-clipped=2.0 2023-05-19 03:20:48,011 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.7141, 3.1873, 5.1399, 2.7693, 2.8934, 3.9011, 3.1953, 3.8120], device='cuda:0'), covar=tensor([0.0466, 0.1289, 0.0407, 0.1175, 0.1962, 0.1659, 0.1407, 0.1206], device='cuda:0'), in_proj_covar=tensor([0.0249, 0.0246, 0.0273, 0.0193, 0.0246, 0.0305, 0.0235, 0.0280], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 03:20:52,042 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=343015.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 03:21:15,075 INFO [finetune.py:992] (0/2) Epoch 20, batch 10600, loss[loss=0.1401, simple_loss=0.2361, pruned_loss=0.02205, over 12264.00 frames. ], tot_loss[loss=0.1608, simple_loss=0.2512, pruned_loss=0.03521, over 2374690.95 frames. ], batch size: 32, lr: 3.06e-03, grad_scale: 16.0 2023-05-19 03:21:15,897 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=343049.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:21:28,958 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=343068.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:21:51,285 INFO [finetune.py:992] (0/2) Epoch 20, batch 10650, loss[loss=0.1561, simple_loss=0.2385, pruned_loss=0.03684, over 12363.00 frames. ], tot_loss[loss=0.1605, simple_loss=0.2509, pruned_loss=0.035, over 2373959.35 frames. ], batch size: 30, lr: 3.06e-03, grad_scale: 16.0 2023-05-19 03:21:58,822 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.742e+02 2.486e+02 2.923e+02 3.502e+02 1.456e+03, threshold=5.845e+02, percent-clipped=2.0 2023-05-19 03:22:13,828 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=343129.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:22:26,623 INFO [finetune.py:992] (0/2) Epoch 20, batch 10700, loss[loss=0.1494, simple_loss=0.2375, pruned_loss=0.03068, over 12113.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.2517, pruned_loss=0.03536, over 2369278.39 frames. ], batch size: 30, lr: 3.06e-03, grad_scale: 16.0 2023-05-19 03:22:41,195 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=343169.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 03:22:56,407 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=343191.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:23:00,024 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=343196.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:23:01,321 INFO [finetune.py:992] (0/2) Epoch 20, batch 10750, loss[loss=0.1604, simple_loss=0.2537, pruned_loss=0.03356, over 11796.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2523, pruned_loss=0.03529, over 2373347.26 frames. ], batch size: 44, lr: 3.06e-03, grad_scale: 16.0 2023-05-19 03:23:07,466 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.3224, 4.6148, 2.8836, 2.6345, 4.1023, 2.6522, 3.8093, 3.1861], device='cuda:0'), covar=tensor([0.0781, 0.0647, 0.1297, 0.1617, 0.0303, 0.1383, 0.0662, 0.0902], device='cuda:0'), in_proj_covar=tensor([0.0193, 0.0267, 0.0181, 0.0205, 0.0149, 0.0186, 0.0205, 0.0180], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 03:23:09,308 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.061e+02 2.782e+02 3.148e+02 3.630e+02 1.010e+03, threshold=6.296e+02, percent-clipped=3.0 2023-05-19 03:23:30,564 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=343239.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:23:34,103 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=343244.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:23:36,809 INFO [finetune.py:992] (0/2) Epoch 20, batch 10800, loss[loss=0.1412, simple_loss=0.2248, pruned_loss=0.02879, over 12348.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2532, pruned_loss=0.03593, over 2365403.06 frames. ], batch size: 31, lr: 3.06e-03, grad_scale: 16.0 2023-05-19 03:23:42,690 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.7157, 3.3276, 5.1706, 2.6217, 2.7737, 3.8035, 3.0682, 3.8860], device='cuda:0'), covar=tensor([0.0471, 0.1243, 0.0270, 0.1231, 0.2083, 0.1460, 0.1524, 0.1097], device='cuda:0'), in_proj_covar=tensor([0.0248, 0.0246, 0.0273, 0.0193, 0.0246, 0.0304, 0.0234, 0.0280], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 03:24:12,046 INFO [finetune.py:992] (0/2) Epoch 20, batch 10850, loss[loss=0.1592, simple_loss=0.254, pruned_loss=0.03221, over 12148.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.2524, pruned_loss=0.03561, over 2372289.65 frames. ], batch size: 34, lr: 3.06e-03, grad_scale: 16.0 2023-05-19 03:24:19,437 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.628e+02 2.620e+02 3.050e+02 3.784e+02 5.779e+02, threshold=6.100e+02, percent-clipped=0.0 2023-05-19 03:24:47,528 INFO [finetune.py:992] (0/2) Epoch 20, batch 10900, loss[loss=0.188, simple_loss=0.284, pruned_loss=0.04597, over 11569.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2527, pruned_loss=0.03586, over 2372323.66 frames. ], batch size: 48, lr: 3.06e-03, grad_scale: 16.0 2023-05-19 03:24:48,389 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=343349.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:25:04,068 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.0212, 2.1995, 3.0053, 3.8954, 2.3420, 3.9683, 3.8927, 4.1116], device='cuda:0'), covar=tensor([0.0144, 0.1534, 0.0540, 0.0186, 0.1308, 0.0319, 0.0277, 0.0128], device='cuda:0'), in_proj_covar=tensor([0.0130, 0.0210, 0.0189, 0.0130, 0.0191, 0.0188, 0.0188, 0.0130], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0002], device='cuda:0') 2023-05-19 03:25:21,808 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=343397.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:25:22,445 INFO [finetune.py:992] (0/2) Epoch 20, batch 10950, loss[loss=0.1562, simple_loss=0.2359, pruned_loss=0.03826, over 11994.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2531, pruned_loss=0.03632, over 2362647.92 frames. ], batch size: 28, lr: 3.06e-03, grad_scale: 16.0 2023-05-19 03:25:30,887 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.725e+02 2.591e+02 3.071e+02 3.722e+02 9.742e+02, threshold=6.142e+02, percent-clipped=3.0 2023-05-19 03:25:41,154 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=343424.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:25:41,192 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.2298, 4.9955, 5.2659, 5.2485, 4.4681, 4.6243, 4.7401, 5.0163], device='cuda:0'), covar=tensor([0.1215, 0.1299, 0.1053, 0.1023, 0.3668, 0.2501, 0.0752, 0.1898], device='cuda:0'), in_proj_covar=tensor([0.0588, 0.0769, 0.0669, 0.0682, 0.0920, 0.0801, 0.0611, 0.0514], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:0') 2023-05-19 03:25:41,316 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.6715, 3.4312, 5.2237, 2.6016, 2.9376, 3.8009, 3.2326, 3.9579], device='cuda:0'), covar=tensor([0.0475, 0.1139, 0.0265, 0.1265, 0.1924, 0.1601, 0.1428, 0.1017], device='cuda:0'), in_proj_covar=tensor([0.0247, 0.0245, 0.0271, 0.0192, 0.0244, 0.0302, 0.0233, 0.0278], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 03:25:57,576 INFO [finetune.py:992] (0/2) Epoch 20, batch 11000, loss[loss=0.1614, simple_loss=0.2516, pruned_loss=0.03561, over 12115.00 frames. ], tot_loss[loss=0.1641, simple_loss=0.2539, pruned_loss=0.03713, over 2352109.09 frames. ], batch size: 33, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:25:58,532 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.7514, 2.9332, 4.3320, 4.5114, 2.9894, 2.6308, 3.0006, 2.1854], device='cuda:0'), covar=tensor([0.1797, 0.2732, 0.0549, 0.0483, 0.1356, 0.2796, 0.2745, 0.4168], device='cuda:0'), in_proj_covar=tensor([0.0321, 0.0403, 0.0288, 0.0317, 0.0290, 0.0334, 0.0414, 0.0389], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-19 03:26:12,278 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=343469.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 03:26:15,091 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.3937, 4.2351, 4.3787, 4.6970, 3.1097, 4.1361, 2.7732, 4.4305], device='cuda:0'), covar=tensor([0.1473, 0.0653, 0.0728, 0.0543, 0.1136, 0.0608, 0.1689, 0.1071], device='cuda:0'), in_proj_covar=tensor([0.0237, 0.0280, 0.0306, 0.0372, 0.0250, 0.0250, 0.0269, 0.0379], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-19 03:26:22,633 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.1561, 2.2036, 2.6910, 3.1738, 2.2570, 3.2244, 3.1145, 3.3443], device='cuda:0'), covar=tensor([0.0226, 0.1245, 0.0549, 0.0230, 0.1089, 0.0408, 0.0439, 0.0177], device='cuda:0'), in_proj_covar=tensor([0.0130, 0.0210, 0.0189, 0.0130, 0.0190, 0.0187, 0.0189, 0.0130], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0002], device='cuda:0') 2023-05-19 03:26:32,799 INFO [finetune.py:992] (0/2) Epoch 20, batch 11050, loss[loss=0.159, simple_loss=0.2298, pruned_loss=0.04412, over 12014.00 frames. ], tot_loss[loss=0.1677, simple_loss=0.2573, pruned_loss=0.03905, over 2320986.22 frames. ], batch size: 28, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:26:40,290 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.135e+02 3.090e+02 3.545e+02 4.332e+02 7.495e+02, threshold=7.090e+02, percent-clipped=4.0 2023-05-19 03:26:46,598 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=343517.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 03:26:47,437 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.0149, 4.5615, 3.8963, 4.7905, 4.2854, 2.9410, 4.0008, 3.0093], device='cuda:0'), covar=tensor([0.0995, 0.0733, 0.1586, 0.0626, 0.1318, 0.1862, 0.1136, 0.3300], device='cuda:0'), in_proj_covar=tensor([0.0320, 0.0391, 0.0372, 0.0355, 0.0384, 0.0284, 0.0358, 0.0374], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 03:27:05,791 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.4274, 4.7860, 2.9429, 2.4366, 4.2449, 2.6084, 4.0105, 3.2502], device='cuda:0'), covar=tensor([0.0664, 0.0535, 0.1229, 0.2101, 0.0284, 0.1717, 0.0580, 0.1000], device='cuda:0'), in_proj_covar=tensor([0.0193, 0.0267, 0.0181, 0.0204, 0.0149, 0.0186, 0.0204, 0.0180], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 03:27:05,812 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.1723, 2.6561, 3.6969, 3.1412, 3.5664, 3.2785, 2.6367, 3.6340], device='cuda:0'), covar=tensor([0.0200, 0.0423, 0.0191, 0.0295, 0.0180, 0.0219, 0.0452, 0.0160], device='cuda:0'), in_proj_covar=tensor([0.0199, 0.0218, 0.0209, 0.0204, 0.0237, 0.0180, 0.0213, 0.0210], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 03:27:07,664 INFO [finetune.py:992] (0/2) Epoch 20, batch 11100, loss[loss=0.2158, simple_loss=0.3027, pruned_loss=0.06444, over 10458.00 frames. ], tot_loss[loss=0.1728, simple_loss=0.2622, pruned_loss=0.04164, over 2268575.52 frames. ], batch size: 69, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:27:22,736 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.8996, 3.7212, 3.3274, 3.2011, 2.8999, 2.8281, 3.6857, 2.3421], device='cuda:0'), covar=tensor([0.0351, 0.0157, 0.0228, 0.0233, 0.0451, 0.0402, 0.0155, 0.0633], device='cuda:0'), in_proj_covar=tensor([0.0203, 0.0174, 0.0181, 0.0205, 0.0211, 0.0208, 0.0184, 0.0216], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 03:27:42,206 INFO [finetune.py:992] (0/2) Epoch 20, batch 11150, loss[loss=0.2445, simple_loss=0.3222, pruned_loss=0.08342, over 7044.00 frames. ], tot_loss[loss=0.1783, simple_loss=0.2672, pruned_loss=0.04466, over 2214428.33 frames. ], batch size: 99, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:27:50,621 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.181e+02 3.272e+02 3.831e+02 4.519e+02 7.897e+02, threshold=7.662e+02, percent-clipped=5.0 2023-05-19 03:27:56,441 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.6419, 3.3553, 5.0607, 2.6539, 3.0354, 3.8294, 3.2950, 3.8675], device='cuda:0'), covar=tensor([0.0457, 0.1182, 0.0334, 0.1289, 0.1837, 0.1586, 0.1363, 0.1172], device='cuda:0'), in_proj_covar=tensor([0.0244, 0.0242, 0.0268, 0.0190, 0.0241, 0.0298, 0.0230, 0.0275], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 03:28:02,526 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.1375, 3.9161, 2.7070, 2.4578, 3.5390, 2.4268, 3.6510, 2.8695], device='cuda:0'), covar=tensor([0.0673, 0.0506, 0.1124, 0.1611, 0.0248, 0.1429, 0.0467, 0.0829], device='cuda:0'), in_proj_covar=tensor([0.0192, 0.0264, 0.0180, 0.0203, 0.0148, 0.0185, 0.0203, 0.0179], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 03:28:16,999 INFO [finetune.py:992] (0/2) Epoch 20, batch 11200, loss[loss=0.1816, simple_loss=0.2845, pruned_loss=0.03939, over 12344.00 frames. ], tot_loss[loss=0.1846, simple_loss=0.273, pruned_loss=0.04812, over 2153474.22 frames. ], batch size: 35, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:28:32,955 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.17 vs. limit=2.0 2023-05-19 03:28:51,743 INFO [finetune.py:992] (0/2) Epoch 20, batch 11250, loss[loss=0.272, simple_loss=0.3402, pruned_loss=0.1019, over 6632.00 frames. ], tot_loss[loss=0.1918, simple_loss=0.2793, pruned_loss=0.0522, over 2090235.01 frames. ], batch size: 98, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:28:58,900 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.331e+02 3.392e+02 3.781e+02 4.930e+02 8.926e+02, threshold=7.563e+02, percent-clipped=5.0 2023-05-19 03:29:08,428 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.6418, 2.3366, 3.4174, 3.3891, 3.5837, 3.6573, 3.6539, 2.7042], device='cuda:0'), covar=tensor([0.0077, 0.0477, 0.0175, 0.0120, 0.0123, 0.0112, 0.0107, 0.0479], device='cuda:0'), in_proj_covar=tensor([0.0095, 0.0126, 0.0108, 0.0086, 0.0111, 0.0122, 0.0108, 0.0142], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 03:29:09,019 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=343724.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:29:25,776 INFO [finetune.py:992] (0/2) Epoch 20, batch 11300, loss[loss=0.204, simple_loss=0.2958, pruned_loss=0.05606, over 10342.00 frames. ], tot_loss[loss=0.1982, simple_loss=0.2851, pruned_loss=0.05563, over 2045402.75 frames. ], batch size: 69, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:29:41,945 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=343772.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:29:56,110 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([1.9634, 2.1517, 2.1969, 2.0512, 1.8441, 2.0156, 1.9894, 1.5997], device='cuda:0'), covar=tensor([0.0365, 0.0171, 0.0197, 0.0225, 0.0339, 0.0224, 0.0205, 0.0451], device='cuda:0'), in_proj_covar=tensor([0.0202, 0.0173, 0.0180, 0.0204, 0.0211, 0.0206, 0.0184, 0.0215], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 03:29:59,080 INFO [finetune.py:992] (0/2) Epoch 20, batch 11350, loss[loss=0.2241, simple_loss=0.3115, pruned_loss=0.06836, over 7430.00 frames. ], tot_loss[loss=0.2033, simple_loss=0.2899, pruned_loss=0.05841, over 1982521.99 frames. ], batch size: 99, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:30:07,464 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.355e+02 3.459e+02 3.988e+02 4.868e+02 7.641e+02, threshold=7.976e+02, percent-clipped=1.0 2023-05-19 03:30:33,784 INFO [finetune.py:992] (0/2) Epoch 20, batch 11400, loss[loss=0.2091, simple_loss=0.3023, pruned_loss=0.05795, over 10317.00 frames. ], tot_loss[loss=0.2082, simple_loss=0.2939, pruned_loss=0.06125, over 1934474.71 frames. ], batch size: 68, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:30:48,324 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.9148, 4.3269, 3.7708, 4.6296, 4.1143, 2.7315, 3.7719, 2.9400], device='cuda:0'), covar=tensor([0.0947, 0.0777, 0.1511, 0.0539, 0.1489, 0.2008, 0.1345, 0.3484], device='cuda:0'), in_proj_covar=tensor([0.0312, 0.0379, 0.0362, 0.0344, 0.0374, 0.0277, 0.0350, 0.0364], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 03:30:55,929 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.8654, 5.6196, 5.2753, 5.1951, 5.7186, 4.9201, 5.1001, 5.1211], device='cuda:0'), covar=tensor([0.1346, 0.0876, 0.0948, 0.1565, 0.0758, 0.2002, 0.1950, 0.0996], device='cuda:0'), in_proj_covar=tensor([0.0370, 0.0514, 0.0418, 0.0457, 0.0477, 0.0454, 0.0413, 0.0395], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 03:31:01,133 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=343888.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:31:07,525 INFO [finetune.py:992] (0/2) Epoch 20, batch 11450, loss[loss=0.27, simple_loss=0.3435, pruned_loss=0.09822, over 7099.00 frames. ], tot_loss[loss=0.2142, simple_loss=0.2985, pruned_loss=0.06491, over 1874234.11 frames. ], batch size: 99, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:31:09,784 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([5.3022, 5.2685, 5.0853, 4.6887, 4.6804, 5.2348, 4.9426, 4.7514], device='cuda:0'), covar=tensor([0.0727, 0.0894, 0.0694, 0.1636, 0.1253, 0.0771, 0.1386, 0.0944], device='cuda:0'), in_proj_covar=tensor([0.0633, 0.0575, 0.0533, 0.0644, 0.0435, 0.0749, 0.0789, 0.0570], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002], device='cuda:0') 2023-05-19 03:31:14,911 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.474e+02 3.415e+02 3.844e+02 4.424e+02 1.075e+03, threshold=7.689e+02, percent-clipped=3.0 2023-05-19 03:31:41,719 INFO [finetune.py:992] (0/2) Epoch 20, batch 11500, loss[loss=0.2591, simple_loss=0.3271, pruned_loss=0.09555, over 6570.00 frames. ], tot_loss[loss=0.2166, simple_loss=0.3003, pruned_loss=0.0665, over 1850278.73 frames. ], batch size: 98, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:31:42,538 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=343949.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:31:43,872 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.2864, 2.9631, 3.0742, 3.2595, 2.5858, 3.0396, 2.5479, 2.5497], device='cuda:0'), covar=tensor([0.1574, 0.1043, 0.0857, 0.0525, 0.1051, 0.0870, 0.1756, 0.0499], device='cuda:0'), in_proj_covar=tensor([0.0234, 0.0276, 0.0302, 0.0365, 0.0246, 0.0246, 0.0265, 0.0374], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2023-05-19 03:31:53,650 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.8825, 4.6047, 4.8919, 4.4111, 4.6236, 4.4394, 4.8632, 4.5887], device='cuda:0'), covar=tensor([0.0300, 0.0382, 0.0271, 0.0263, 0.0431, 0.0365, 0.0252, 0.0401], device='cuda:0'), in_proj_covar=tensor([0.0280, 0.0279, 0.0306, 0.0276, 0.0276, 0.0277, 0.0256, 0.0229], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 03:32:15,428 INFO [finetune.py:992] (0/2) Epoch 20, batch 11550, loss[loss=0.2387, simple_loss=0.3082, pruned_loss=0.08459, over 6975.00 frames. ], tot_loss[loss=0.2194, simple_loss=0.3022, pruned_loss=0.06835, over 1817611.61 frames. ], batch size: 98, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:32:17,021 INFO [checkpoint.py:75] (0/2) Saving checkpoint to pruned_transducer_stateless7/exp_giga_finetune/checkpoint-244000.pt 2023-05-19 03:32:25,564 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.374e+02 3.444e+02 3.824e+02 4.615e+02 8.446e+02, threshold=7.648e+02, percent-clipped=4.0 2023-05-19 03:32:51,713 INFO [finetune.py:992] (0/2) Epoch 20, batch 11600, loss[loss=0.2222, simple_loss=0.3125, pruned_loss=0.06597, over 10312.00 frames. ], tot_loss[loss=0.2212, simple_loss=0.3032, pruned_loss=0.06959, over 1788290.41 frames. ], batch size: 68, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:33:00,436 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.2825, 4.6411, 2.8715, 2.3501, 4.0121, 2.5406, 4.0691, 3.2123], device='cuda:0'), covar=tensor([0.0706, 0.0465, 0.1287, 0.2086, 0.0270, 0.1483, 0.0370, 0.0868], device='cuda:0'), in_proj_covar=tensor([0.0185, 0.0255, 0.0175, 0.0198, 0.0143, 0.0180, 0.0196, 0.0174], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 03:33:27,054 INFO [finetune.py:992] (0/2) Epoch 20, batch 11650, loss[loss=0.2052, simple_loss=0.2941, pruned_loss=0.05818, over 10398.00 frames. ], tot_loss[loss=0.2211, simple_loss=0.3028, pruned_loss=0.06976, over 1771225.99 frames. ], batch size: 68, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:33:34,722 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.428e+02 3.379e+02 3.969e+02 4.667e+02 7.645e+02, threshold=7.937e+02, percent-clipped=0.0 2023-05-19 03:33:45,891 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.5674, 3.2325, 3.4910, 3.6331, 3.6023, 3.6610, 3.5077, 2.6239], device='cuda:0'), covar=tensor([0.0094, 0.0212, 0.0180, 0.0102, 0.0092, 0.0148, 0.0110, 0.0905], device='cuda:0'), in_proj_covar=tensor([0.0074, 0.0086, 0.0090, 0.0079, 0.0065, 0.0100, 0.0087, 0.0104], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 03:34:01,201 INFO [finetune.py:992] (0/2) Epoch 20, batch 11700, loss[loss=0.2136, simple_loss=0.2982, pruned_loss=0.06454, over 11474.00 frames. ], tot_loss[loss=0.2206, simple_loss=0.3015, pruned_loss=0.0698, over 1749080.35 frames. ], batch size: 48, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:34:16,936 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=344172.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:34:17,789 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.22 vs. limit=2.0 2023-05-19 03:34:35,123 INFO [finetune.py:992] (0/2) Epoch 20, batch 11750, loss[loss=0.1929, simple_loss=0.2806, pruned_loss=0.0526, over 11995.00 frames. ], tot_loss[loss=0.221, simple_loss=0.3016, pruned_loss=0.07013, over 1735599.82 frames. ], batch size: 40, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:34:42,388 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.504e+02 3.507e+02 3.969e+02 4.748e+02 1.022e+03, threshold=7.938e+02, percent-clipped=5.0 2023-05-19 03:34:58,266 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=344233.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:35:05,823 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=344244.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:35:08,358 INFO [finetune.py:992] (0/2) Epoch 20, batch 11800, loss[loss=0.1975, simple_loss=0.2884, pruned_loss=0.05328, over 11141.00 frames. ], tot_loss[loss=0.2234, simple_loss=0.3036, pruned_loss=0.07162, over 1718132.40 frames. ], batch size: 55, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:35:41,984 INFO [finetune.py:992] (0/2) Epoch 20, batch 11850, loss[loss=0.245, simple_loss=0.3191, pruned_loss=0.0855, over 6513.00 frames. ], tot_loss[loss=0.2243, simple_loss=0.3047, pruned_loss=0.07191, over 1712148.56 frames. ], batch size: 98, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:35:49,729 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.265e+02 3.358e+02 3.867e+02 4.638e+02 9.968e+02, threshold=7.734e+02, percent-clipped=3.0 2023-05-19 03:36:15,906 INFO [finetune.py:992] (0/2) Epoch 20, batch 11900, loss[loss=0.2122, simple_loss=0.302, pruned_loss=0.06119, over 11810.00 frames. ], tot_loss[loss=0.2236, simple_loss=0.3047, pruned_loss=0.07122, over 1713684.98 frames. ], batch size: 44, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:36:17,435 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=344350.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:36:49,627 INFO [finetune.py:992] (0/2) Epoch 20, batch 11950, loss[loss=0.1956, simple_loss=0.2825, pruned_loss=0.05433, over 7139.00 frames. ], tot_loss[loss=0.2194, simple_loss=0.3013, pruned_loss=0.06869, over 1696882.15 frames. ], batch size: 101, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:36:50,040 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=96, metric=1.53 vs. limit=2.0 2023-05-19 03:36:57,297 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.020e+02 3.070e+02 3.404e+02 4.045e+02 7.617e+02, threshold=6.808e+02, percent-clipped=0.0 2023-05-19 03:36:58,856 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=344411.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 03:37:12,405 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.8801, 3.8602, 3.9022, 3.9440, 3.7523, 3.7942, 3.6985, 3.8387], device='cuda:0'), covar=tensor([0.1105, 0.0748, 0.1291, 0.0719, 0.1555, 0.1200, 0.0579, 0.0981], device='cuda:0'), in_proj_covar=tensor([0.0537, 0.0708, 0.0616, 0.0629, 0.0836, 0.0732, 0.0562, 0.0475], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0003], device='cuda:0') 2023-05-19 03:37:24,243 INFO [finetune.py:992] (0/2) Epoch 20, batch 12000, loss[loss=0.1898, simple_loss=0.2742, pruned_loss=0.05271, over 6934.00 frames. ], tot_loss[loss=0.2141, simple_loss=0.2973, pruned_loss=0.06552, over 1691511.00 frames. ], batch size: 101, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:37:24,244 INFO [finetune.py:1017] (0/2) Computing validation loss 2023-05-19 03:37:30,217 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([6.0076, 5.9246, 5.9184, 5.4007, 5.2280, 5.9983, 5.6067, 5.6433], device='cuda:0'), covar=tensor([0.0758, 0.1171, 0.0590, 0.1493, 0.0498, 0.0615, 0.1125, 0.0725], device='cuda:0'), in_proj_covar=tensor([0.0616, 0.0561, 0.0518, 0.0624, 0.0424, 0.0723, 0.0762, 0.0553], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002], device='cuda:0') 2023-05-19 03:37:31,413 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.6002, 4.8141, 3.2424, 2.9937, 4.2179, 2.9213, 4.1991, 3.4518], device='cuda:0'), covar=tensor([0.0652, 0.0414, 0.1099, 0.1509, 0.0225, 0.1368, 0.0384, 0.0833], device='cuda:0'), in_proj_covar=tensor([0.0183, 0.0250, 0.0173, 0.0196, 0.0141, 0.0179, 0.0193, 0.0172], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 03:37:41,836 INFO [finetune.py:1026] (0/2) Epoch 20, validation: loss=0.2856, simple_loss=0.3597, pruned_loss=0.1057, over 1020973.00 frames. 2023-05-19 03:37:41,836 INFO [finetune.py:1027] (0/2) Maximum memory allocated so far is 12604MB 2023-05-19 03:37:42,786 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.2667, 4.5627, 2.8276, 2.6582, 3.9757, 2.6853, 3.9249, 3.1190], device='cuda:0'), covar=tensor([0.0751, 0.0421, 0.1231, 0.1645, 0.0275, 0.1395, 0.0452, 0.0860], device='cuda:0'), in_proj_covar=tensor([0.0183, 0.0250, 0.0173, 0.0196, 0.0141, 0.0179, 0.0193, 0.0172], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 03:38:04,122 INFO [scaling.py:679] (0/2) Whitening: num_groups=8, num_channels=192, metric=1.97 vs. limit=2.0 2023-05-19 03:38:11,786 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.5889, 4.5686, 4.4792, 4.0958, 4.1335, 4.5442, 4.3076, 4.1495], device='cuda:0'), covar=tensor([0.0907, 0.0970, 0.0670, 0.1444, 0.2424, 0.0967, 0.1474, 0.0925], device='cuda:0'), in_proj_covar=tensor([0.0616, 0.0561, 0.0518, 0.0624, 0.0424, 0.0723, 0.0762, 0.0552], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002], device='cuda:0') 2023-05-19 03:38:15,501 INFO [finetune.py:992] (0/2) Epoch 20, batch 12050, loss[loss=0.2038, simple_loss=0.2822, pruned_loss=0.06265, over 6897.00 frames. ], tot_loss[loss=0.2096, simple_loss=0.2932, pruned_loss=0.06295, over 1686276.11 frames. ], batch size: 99, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:38:22,542 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.796e+02 3.485e+02 4.183e+02 9.162e+02, threshold=6.969e+02, percent-clipped=1.0 2023-05-19 03:38:30,182 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.4114, 3.5228, 3.1502, 3.5358, 3.3741, 2.7451, 3.2244, 2.8590], device='cuda:0'), covar=tensor([0.0983, 0.1059, 0.1789, 0.0892, 0.1476, 0.1739, 0.1319, 0.3100], device='cuda:0'), in_proj_covar=tensor([0.0300, 0.0364, 0.0346, 0.0325, 0.0360, 0.0268, 0.0336, 0.0351], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-19 03:38:34,559 INFO [zipformer.py:625] (0/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=344528.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:38:44,431 INFO [zipformer.py:625] (0/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=344544.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:38:46,872 INFO [finetune.py:992] (0/2) Epoch 20, batch 12100, loss[loss=0.2112, simple_loss=0.2855, pruned_loss=0.06842, over 7184.00 frames. ], tot_loss[loss=0.2084, simple_loss=0.2925, pruned_loss=0.06219, over 1692354.62 frames. ], batch size: 100, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:38:50,360 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.7174, 2.9630, 4.3482, 4.4984, 3.0200, 2.6284, 2.8221, 2.0934], device='cuda:0'), covar=tensor([0.1696, 0.2616, 0.0462, 0.0436, 0.1279, 0.2771, 0.3132, 0.4530], device='cuda:0'), in_proj_covar=tensor([0.0314, 0.0393, 0.0281, 0.0308, 0.0284, 0.0328, 0.0409, 0.0383], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-19 03:38:59,513 INFO [zipformer.py:625] (0/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=344567.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:39:06,297 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([4.3948, 4.1883, 4.1523, 4.3609, 4.2890, 4.4171, 4.3081, 2.3245], device='cuda:0'), covar=tensor([0.0094, 0.0090, 0.0145, 0.0073, 0.0067, 0.0133, 0.0082, 0.1088], device='cuda:0'), in_proj_covar=tensor([0.0072, 0.0083, 0.0087, 0.0076, 0.0063, 0.0096, 0.0084, 0.0101], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 03:39:06,347 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.6837, 2.2839, 2.9726, 2.5545, 2.8687, 2.9010, 2.1729, 2.9473], device='cuda:0'), covar=tensor([0.0153, 0.0377, 0.0173, 0.0285, 0.0167, 0.0176, 0.0408, 0.0151], device='cuda:0'), in_proj_covar=tensor([0.0184, 0.0204, 0.0192, 0.0189, 0.0219, 0.0168, 0.0198, 0.0194], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2023-05-19 03:39:10,751 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([2.7898, 3.0430, 2.4089, 2.2748, 2.8236, 2.3482, 2.9705, 2.6188], device='cuda:0'), covar=tensor([0.0714, 0.0646, 0.1048, 0.1507, 0.0305, 0.1293, 0.0537, 0.0876], device='cuda:0'), in_proj_covar=tensor([0.0182, 0.0248, 0.0172, 0.0195, 0.0140, 0.0178, 0.0192, 0.0172], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:0') 2023-05-19 03:39:15,637 INFO [zipformer.py:625] (0/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=344592.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:39:15,825 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.3597, 3.5542, 3.1701, 3.6306, 3.4057, 2.6659, 3.1804, 2.8540], device='cuda:0'), covar=tensor([0.0998, 0.1072, 0.1678, 0.0838, 0.1507, 0.1821, 0.1516, 0.3122], device='cuda:0'), in_proj_covar=tensor([0.0301, 0.0365, 0.0346, 0.0325, 0.0360, 0.0268, 0.0337, 0.0352], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2023-05-19 03:39:19,331 INFO [finetune.py:992] (0/2) Epoch 20, batch 12150, loss[loss=0.2064, simple_loss=0.2976, pruned_loss=0.05763, over 11106.00 frames. ], tot_loss[loss=0.2089, simple_loss=0.2929, pruned_loss=0.0624, over 1690231.28 frames. ], batch size: 55, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:39:26,616 INFO [optim.py:368] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.168e+02 3.232e+02 3.821e+02 4.297e+02 9.292e+02, threshold=7.641e+02, percent-clipped=1.0 2023-05-19 03:39:30,440 INFO [zipformer.py:1454] (0/2) attn_weights_entropy = tensor([3.6504, 2.3449, 3.3736, 3.5249, 3.4863, 3.7236, 3.6002, 2.6332], device='cuda:0'), covar=tensor([0.0091, 0.0452, 0.0193, 0.0091, 0.0156, 0.0111, 0.0112, 0.0506], device='cuda:0'), in_proj_covar=tensor([0.0091, 0.0122, 0.0103, 0.0082, 0.0106, 0.0118, 0.0103, 0.0138], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2023-05-19 03:39:38,526 INFO [zipformer.py:625] (0/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=344628.0, num_to_drop=1, layers_to_drop={3} 2023-05-19 03:39:50,400 INFO [finetune.py:992] (0/2) Epoch 20, batch 12200, loss[loss=0.2224, simple_loss=0.304, pruned_loss=0.07043, over 6890.00 frames. ], tot_loss[loss=0.2102, simple_loss=0.2938, pruned_loss=0.06331, over 1665193.75 frames. ], batch size: 98, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:39:50,766 INFO [scaling.py:679] (0/2) Whitening: num_groups=1, num_channels=384, metric=4.83 vs. limit=5.0 2023-05-19 03:40:11,137 INFO [checkpoint.py:75] (0/2) Saving checkpoint to pruned_transducer_stateless7/exp_giga_finetune/epoch-20.pt 2023-05-19 03:40:16,777 INFO [finetune.py:1268] (0/2) Done!