2023-05-18 20:25:35,195 INFO [finetune.py:1062] (1/2) Training started 2023-05-18 20:25:35,195 INFO [finetune.py:1072] (1/2) Device: cuda:1 2023-05-18 20:25:35,200 INFO [finetune.py:1081] (1/2) {'frame_shift_ms': 10.0, 'allowed_excess_duration_ratio': 0.1, 'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.23.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'a23383c5a381713b51e9014f3f05d096f8aceec3', 'k2-git-date': 'Wed Apr 26 15:33:33 2023', 'lhotse-version': '1.14.0.dev+git.b61b917.dirty', 'torch-version': '1.13.1', 'torch-cuda-available': True, 'torch-cuda-version': '11.6', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': '45c13e9-dirty', 'icefall-git-date': 'Mon Apr 24 15:00:02 2023', 'icefall-path': '/k2-dev/yangyifan/icefall-master', 'k2-path': '/k2-dev/yangyifan/anaconda3/envs/icefall/lib/python3.10/site-packages/k2-1.23.4.dev20230427+cuda11.6.torch1.13.1-py3.10-linux-x86_64.egg/k2/__init__.py', 'lhotse-path': '/k2-dev/yangyifan/anaconda3/envs/icefall/lib/python3.10/site-packages/lhotse-1.14.0.dev0+git.b61b917.dirty-py3.10.egg/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-10-0221105906-5745685d6b-t8zzx', 'IP address': '10.177.57.19'}, 'world_size': 2, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 20, 'start_epoch': 18, 'start_batch': 0, 'exp_dir': PosixPath('pruned_transducer_stateless7/exp_giga_finetune'), 'bpe_model': 'icefall-asr-librispeech-pruned-transducer-stateless7-2022-11-11/data/lang_bpe_500/bpe.model', 'base_lr': 0.005, 'lr_batches': 100000.0, 'lr_epochs': 100.0, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 2000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'num_encoder_layers': '2,4,3,2,4', 'feedforward_dims': '1024,1024,2048,2048,1024', 'nhead': '8,8,8,8,8', 'encoder_dims': '384,384,384,384,384', 'attention_dims': '192,192,192,192,192', 'encoder_unmasked_dims': '256,256,256,256,256', 'zipformer_downsampling_factors': '1,2,4,8,2', 'cnn_module_kernels': '31,31,31,31,31', 'decoder_dim': 512, 'joiner_dim': 512, 'do_finetune': True, 'use_mux': True, 'init_modules': None, 'finetune_ckpt': None, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 500, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'subset': 'S', 'small_dev': False, 'blank_id': 0, 'vocab_size': 500} 2023-05-18 20:25:35,200 INFO [finetune.py:1083] (1/2) About to create model 2023-05-18 20:25:35,864 INFO [zipformer.py:178] (1/2) At encoder stack 4, which has downsampling_factor=2, we will combine the outputs of layers 1 and 3, with downsampling_factors=2 and 8. 2023-05-18 20:25:35,880 INFO [finetune.py:1087] (1/2) Number of model parameters: 70369391 2023-05-18 20:25:35,880 INFO [checkpoint.py:112] (1/2) Loading checkpoint from pruned_transducer_stateless7/exp_giga_finetune/epoch-17.pt 2023-05-18 20:25:42,078 INFO [finetune.py:1109] (1/2) Using DDP 2023-05-18 20:25:42,250 INFO [finetune.py:1129] (1/2) Loading optimizer state dict 2023-05-18 20:25:42,735 INFO [finetune.py:1137] (1/2) Loading scheduler state dict 2023-05-18 20:25:42,735 INFO [asr_datamodule.py:425] (1/2) About to get the shuffled train-clean-100, train-clean-360 and train-other-500 cuts 2023-05-18 20:25:42,737 INFO [gigaspeech.py:389] (1/2) About to get train_S cuts 2023-05-18 20:25:42,737 INFO [gigaspeech.py:216] (1/2) Enable MUSAN 2023-05-18 20:25:42,737 INFO [gigaspeech.py:217] (1/2) About to get Musan cuts 2023-05-18 20:25:44,770 INFO [gigaspeech.py:241] (1/2) Enable SpecAugment 2023-05-18 20:25:44,770 INFO [gigaspeech.py:242] (1/2) Time warp factor: 80 2023-05-18 20:25:44,770 INFO [gigaspeech.py:252] (1/2) Num frame mask: 10 2023-05-18 20:25:44,771 INFO [gigaspeech.py:265] (1/2) About to create train dataset 2023-05-18 20:25:44,771 INFO [gigaspeech.py:291] (1/2) Using DynamicBucketingSampler. 2023-05-18 20:25:49,491 INFO [gigaspeech.py:306] (1/2) About to create train dataloader 2023-05-18 20:25:49,492 INFO [gigaspeech.py:396] (1/2) About to get dev cuts 2023-05-18 20:25:49,493 INFO [gigaspeech.py:337] (1/2) About to create dev dataset 2023-05-18 20:25:49,818 INFO [gigaspeech.py:354] (1/2) About to create dev dataloader 2023-05-18 20:25:49,818 INFO [finetune.py:1225] (1/2) Loading grad scaler state dict 2023-05-18 20:26:07,167 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.2024, 4.8560, 5.2040, 4.6468, 4.9327, 4.7904, 5.2589, 4.9076], device='cuda:1'), covar=tensor([0.0330, 0.0400, 0.0286, 0.0298, 0.0431, 0.0394, 0.0186, 0.0283], device='cuda:1'), in_proj_covar=tensor([0.0259, 0.0264, 0.0285, 0.0262, 0.0258, 0.0257, 0.0235, 0.0209], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 20:26:08,253 WARNING [optim.py:388] (1/2) Scaling gradients by 0.045329876244068146, model_norm_threshold=726.4490966796875 2023-05-18 20:26:08,363 INFO [optim.py:450] (1/2) Parameter Dominanting tot_sumsq module.encoder.encoders.3.out_combiner.weight1 with proportion 0.72, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.848e+08, grad_sumsq = 1.848e+08, orig_rms_sq=1.000e+00 2023-05-18 20:26:08,387 INFO [finetune.py:992] (1/2) Epoch 18, batch 0, loss[loss=0.3907, simple_loss=0.44, pruned_loss=0.1707, over 12183.00 frames. ], tot_loss[loss=0.3907, simple_loss=0.44, pruned_loss=0.1707, over 12183.00 frames. ], batch size: 35, lr: 3.27e-03, grad_scale: 8.0 2023-05-18 20:26:08,387 INFO [finetune.py:1017] (1/2) Computing validation loss 2023-05-18 20:26:25,050 INFO [finetune.py:1026] (1/2) Epoch 18, validation: loss=0.2903, simple_loss=0.3616, pruned_loss=0.1095, over 1020973.00 frames. 2023-05-18 20:26:25,051 INFO [finetune.py:1027] (1/2) Maximum memory allocated so far is 10926MB 2023-05-18 20:26:33,576 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.3805, 2.5206, 3.6116, 4.3362, 3.5927, 4.3637, 3.6816, 2.7540], device='cuda:1'), covar=tensor([0.0052, 0.0529, 0.0179, 0.0058, 0.0201, 0.0101, 0.0158, 0.0605], device='cuda:1'), in_proj_covar=tensor([0.0091, 0.0125, 0.0106, 0.0080, 0.0106, 0.0118, 0.0103, 0.0141], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 20:26:34,540 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=3.01 vs. limit=5.0 2023-05-18 20:26:38,419 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.449e+02 3.327e+02 3.889e+02 4.846e+02 1.603e+04, threshold=7.779e+02, percent-clipped=2.0 2023-05-18 20:26:39,307 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=307998.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:26:46,157 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.66 vs. limit=2.0 2023-05-18 20:26:56,370 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.6878, 3.5959, 3.2774, 3.2415, 2.9352, 2.8233, 3.5678, 2.5413], device='cuda:1'), covar=tensor([0.0437, 0.0185, 0.0233, 0.0248, 0.0432, 0.0364, 0.0164, 0.0512], device='cuda:1'), in_proj_covar=tensor([0.0193, 0.0159, 0.0164, 0.0188, 0.0199, 0.0194, 0.0173, 0.0200], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 20:27:05,533 INFO [finetune.py:992] (1/2) Epoch 18, batch 50, loss[loss=0.1803, simple_loss=0.2781, pruned_loss=0.04131, over 12088.00 frames. ], tot_loss[loss=0.1756, simple_loss=0.2658, pruned_loss=0.04266, over 541566.90 frames. ], batch size: 32, lr: 3.27e-03, grad_scale: 8.0 2023-05-18 20:27:16,479 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.1705, 3.5212, 3.5565, 3.9181, 2.9363, 3.5139, 2.3876, 3.3416], device='cuda:1'), covar=tensor([0.2005, 0.0954, 0.1085, 0.0749, 0.1204, 0.0794, 0.2213, 0.1103], device='cuda:1'), in_proj_covar=tensor([0.0225, 0.0262, 0.0288, 0.0345, 0.0237, 0.0237, 0.0254, 0.0356], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-18 20:27:17,008 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=308046.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:27:41,406 INFO [finetune.py:992] (1/2) Epoch 18, batch 100, loss[loss=0.1596, simple_loss=0.2373, pruned_loss=0.04093, over 12314.00 frames. ], tot_loss[loss=0.1694, simple_loss=0.2587, pruned_loss=0.04005, over 942256.54 frames. ], batch size: 28, lr: 3.27e-03, grad_scale: 8.0 2023-05-18 20:27:53,420 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.085e+02 2.648e+02 3.135e+02 3.749e+02 6.822e+02, threshold=6.269e+02, percent-clipped=0.0 2023-05-18 20:28:01,155 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.42 vs. limit=2.0 2023-05-18 20:28:01,554 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.17 vs. limit=2.0 2023-05-18 20:28:16,095 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.0704, 4.9747, 4.9132, 4.9808, 4.6684, 5.1128, 5.1012, 5.3120], device='cuda:1'), covar=tensor([0.0372, 0.0196, 0.0230, 0.0375, 0.0836, 0.0370, 0.0209, 0.0182], device='cuda:1'), in_proj_covar=tensor([0.0186, 0.0186, 0.0180, 0.0233, 0.0225, 0.0207, 0.0166, 0.0220], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:1') 2023-05-18 20:28:17,246 INFO [finetune.py:992] (1/2) Epoch 18, batch 150, loss[loss=0.15, simple_loss=0.2354, pruned_loss=0.03225, over 12129.00 frames. ], tot_loss[loss=0.1692, simple_loss=0.2594, pruned_loss=0.03957, over 1269084.16 frames. ], batch size: 30, lr: 3.27e-03, grad_scale: 8.0 2023-05-18 20:28:52,837 INFO [finetune.py:992] (1/2) Epoch 18, batch 200, loss[loss=0.151, simple_loss=0.2378, pruned_loss=0.03213, over 12293.00 frames. ], tot_loss[loss=0.1681, simple_loss=0.2585, pruned_loss=0.03885, over 1516186.57 frames. ], batch size: 28, lr: 3.27e-03, grad_scale: 8.0 2023-05-18 20:29:04,737 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 2.636e+02 3.102e+02 3.454e+02 6.960e+02, threshold=6.205e+02, percent-clipped=1.0 2023-05-18 20:29:08,681 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=308202.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:29:27,945 INFO [finetune.py:992] (1/2) Epoch 18, batch 250, loss[loss=0.1453, simple_loss=0.234, pruned_loss=0.02828, over 12299.00 frames. ], tot_loss[loss=0.1675, simple_loss=0.2577, pruned_loss=0.03866, over 1709010.91 frames. ], batch size: 33, lr: 3.27e-03, grad_scale: 8.0 2023-05-18 20:29:44,734 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=308254.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:29:48,945 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=308260.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:29:51,151 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=308263.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:30:00,155 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=308276.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:30:03,374 INFO [finetune.py:992] (1/2) Epoch 18, batch 300, loss[loss=0.1732, simple_loss=0.2571, pruned_loss=0.04467, over 12292.00 frames. ], tot_loss[loss=0.1669, simple_loss=0.2573, pruned_loss=0.03825, over 1866545.48 frames. ], batch size: 33, lr: 3.27e-03, grad_scale: 8.0 2023-05-18 20:30:15,169 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.898e+02 2.684e+02 3.202e+02 3.924e+02 5.708e+02, threshold=6.404e+02, percent-clipped=0.0 2023-05-18 20:30:18,637 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=308302.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:30:23,319 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=308308.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:30:24,118 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=308309.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:30:34,572 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=308324.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:30:38,490 INFO [finetune.py:992] (1/2) Epoch 18, batch 350, loss[loss=0.1373, simple_loss=0.2263, pruned_loss=0.02419, over 12341.00 frames. ], tot_loss[loss=0.1661, simple_loss=0.2567, pruned_loss=0.03775, over 1984235.31 frames. ], batch size: 30, lr: 3.27e-03, grad_scale: 8.0 2023-05-18 20:31:06,519 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=308370.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:31:13,301 INFO [finetune.py:992] (1/2) Epoch 18, batch 400, loss[loss=0.1489, simple_loss=0.2363, pruned_loss=0.03077, over 12355.00 frames. ], tot_loss[loss=0.1657, simple_loss=0.2562, pruned_loss=0.03763, over 2070574.22 frames. ], batch size: 31, lr: 3.27e-03, grad_scale: 8.0 2023-05-18 20:31:19,195 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.19 vs. limit=2.0 2023-05-18 20:31:24,827 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.628e+02 2.483e+02 3.007e+02 3.597e+02 5.788e+02, threshold=6.013e+02, percent-clipped=0.0 2023-05-18 20:31:47,945 INFO [finetune.py:992] (1/2) Epoch 18, batch 450, loss[loss=0.1498, simple_loss=0.2317, pruned_loss=0.03391, over 12364.00 frames. ], tot_loss[loss=0.1659, simple_loss=0.2563, pruned_loss=0.03774, over 2136290.12 frames. ], batch size: 30, lr: 3.27e-03, grad_scale: 8.0 2023-05-18 20:31:56,097 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.4231, 4.9411, 5.3825, 4.7030, 5.0696, 4.7965, 5.4326, 5.1203], device='cuda:1'), covar=tensor([0.0270, 0.0428, 0.0307, 0.0280, 0.0404, 0.0394, 0.0200, 0.0296], device='cuda:1'), in_proj_covar=tensor([0.0270, 0.0276, 0.0299, 0.0272, 0.0269, 0.0269, 0.0245, 0.0219], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 20:32:23,754 INFO [finetune.py:992] (1/2) Epoch 18, batch 500, loss[loss=0.1955, simple_loss=0.2813, pruned_loss=0.05481, over 10463.00 frames. ], tot_loss[loss=0.1653, simple_loss=0.2556, pruned_loss=0.03753, over 2195315.56 frames. ], batch size: 68, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:32:35,720 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.630e+02 3.100e+02 3.585e+02 6.285e+02, threshold=6.200e+02, percent-clipped=2.0 2023-05-18 20:32:36,583 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.5228, 5.3416, 5.4818, 5.5046, 5.1293, 5.1941, 4.8873, 5.4257], device='cuda:1'), covar=tensor([0.0826, 0.0680, 0.0879, 0.0693, 0.2119, 0.1371, 0.0657, 0.1164], device='cuda:1'), in_proj_covar=tensor([0.0544, 0.0704, 0.0624, 0.0633, 0.0839, 0.0748, 0.0561, 0.0489], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0003, 0.0003], device='cuda:1') 2023-05-18 20:32:54,032 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([6.0702, 6.1002, 5.8040, 5.4664, 5.2642, 5.9860, 5.6119, 5.3903], device='cuda:1'), covar=tensor([0.0716, 0.0809, 0.0695, 0.1581, 0.0702, 0.0726, 0.1677, 0.1221], device='cuda:1'), in_proj_covar=tensor([0.0650, 0.0581, 0.0530, 0.0650, 0.0436, 0.0738, 0.0793, 0.0581], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002], device='cuda:1') 2023-05-18 20:32:58,588 INFO [finetune.py:992] (1/2) Epoch 18, batch 550, loss[loss=0.1773, simple_loss=0.2712, pruned_loss=0.04171, over 12301.00 frames. ], tot_loss[loss=0.1645, simple_loss=0.2547, pruned_loss=0.03713, over 2239291.52 frames. ], batch size: 34, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:33:14,357 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=308552.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:33:18,204 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=308558.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:33:33,281 INFO [finetune.py:992] (1/2) Epoch 18, batch 600, loss[loss=0.1584, simple_loss=0.2507, pruned_loss=0.03306, over 12350.00 frames. ], tot_loss[loss=0.1633, simple_loss=0.253, pruned_loss=0.0368, over 2270255.34 frames. ], batch size: 35, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:33:45,727 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.756e+02 2.534e+02 3.033e+02 3.710e+02 8.402e+02, threshold=6.066e+02, percent-clipped=2.0 2023-05-18 20:33:58,013 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=308613.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:34:09,480 INFO [finetune.py:992] (1/2) Epoch 18, batch 650, loss[loss=0.1664, simple_loss=0.2695, pruned_loss=0.03163, over 12202.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2531, pruned_loss=0.03641, over 2297544.77 frames. ], batch size: 35, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:34:26,842 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=2.80 vs. limit=5.0 2023-05-18 20:34:33,424 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=308665.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:34:36,139 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=308669.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:34:43,510 INFO [finetune.py:992] (1/2) Epoch 18, batch 700, loss[loss=0.1696, simple_loss=0.2649, pruned_loss=0.03714, over 11813.00 frames. ], tot_loss[loss=0.1638, simple_loss=0.2543, pruned_loss=0.03664, over 2310666.32 frames. ], batch size: 44, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:34:55,200 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.797e+02 2.630e+02 3.034e+02 3.699e+02 6.493e+02, threshold=6.068e+02, percent-clipped=1.0 2023-05-18 20:35:17,798 INFO [finetune.py:992] (1/2) Epoch 18, batch 750, loss[loss=0.1671, simple_loss=0.2535, pruned_loss=0.04038, over 12277.00 frames. ], tot_loss[loss=0.1638, simple_loss=0.2541, pruned_loss=0.03671, over 2321091.52 frames. ], batch size: 33, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:35:17,971 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=308730.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:35:19,289 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=308732.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:35:23,309 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.2025, 4.8024, 3.0155, 2.7921, 4.0930, 2.7886, 4.1255, 3.2480], device='cuda:1'), covar=tensor([0.0793, 0.0425, 0.1241, 0.1505, 0.0299, 0.1300, 0.0406, 0.0919], device='cuda:1'), in_proj_covar=tensor([0.0188, 0.0252, 0.0177, 0.0199, 0.0140, 0.0183, 0.0198, 0.0174], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 20:35:53,186 INFO [finetune.py:992] (1/2) Epoch 18, batch 800, loss[loss=0.1529, simple_loss=0.2509, pruned_loss=0.02747, over 12156.00 frames. ], tot_loss[loss=0.1643, simple_loss=0.2547, pruned_loss=0.03699, over 2329917.28 frames. ], batch size: 36, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:36:02,528 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=308793.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:36:04,942 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.864e+02 2.681e+02 3.102e+02 3.780e+02 6.537e+02, threshold=6.204e+02, percent-clipped=1.0 2023-05-18 20:36:28,167 INFO [finetune.py:992] (1/2) Epoch 18, batch 850, loss[loss=0.1714, simple_loss=0.2676, pruned_loss=0.03764, over 12018.00 frames. ], tot_loss[loss=0.164, simple_loss=0.2542, pruned_loss=0.03687, over 2346868.06 frames. ], batch size: 40, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:36:47,821 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=308858.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:37:02,664 INFO [finetune.py:992] (1/2) Epoch 18, batch 900, loss[loss=0.1811, simple_loss=0.2746, pruned_loss=0.04384, over 11661.00 frames. ], tot_loss[loss=0.1637, simple_loss=0.2538, pruned_loss=0.03684, over 2349813.45 frames. ], batch size: 48, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:37:11,377 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.0998, 4.4872, 3.8637, 4.7823, 4.3483, 2.6312, 4.0006, 2.9508], device='cuda:1'), covar=tensor([0.0953, 0.0844, 0.1725, 0.0581, 0.1305, 0.2064, 0.1208, 0.3579], device='cuda:1'), in_proj_covar=tensor([0.0314, 0.0384, 0.0366, 0.0335, 0.0376, 0.0280, 0.0353, 0.0373], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 20:37:15,123 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.650e+02 2.566e+02 3.018e+02 3.526e+02 6.170e+02, threshold=6.037e+02, percent-clipped=0.0 2023-05-18 20:37:22,012 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=308906.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:37:23,384 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=308908.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:37:28,438 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.84 vs. limit=2.0 2023-05-18 20:37:38,567 INFO [finetune.py:992] (1/2) Epoch 18, batch 950, loss[loss=0.1444, simple_loss=0.227, pruned_loss=0.03086, over 12277.00 frames. ], tot_loss[loss=0.1636, simple_loss=0.2535, pruned_loss=0.03687, over 2350756.60 frames. ], batch size: 28, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:37:59,857 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.34 vs. limit=2.0 2023-05-18 20:38:03,079 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=308965.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:38:13,228 INFO [finetune.py:992] (1/2) Epoch 18, batch 1000, loss[loss=0.147, simple_loss=0.2364, pruned_loss=0.02886, over 12337.00 frames. ], tot_loss[loss=0.1635, simple_loss=0.2533, pruned_loss=0.03691, over 2356103.98 frames. ], batch size: 30, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:38:24,915 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.628e+02 3.271e+02 3.766e+02 7.611e+02, threshold=6.542e+02, percent-clipped=1.0 2023-05-18 20:38:25,218 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.4131, 4.1567, 4.2338, 4.4894, 3.0286, 4.0798, 2.7284, 4.2279], device='cuda:1'), covar=tensor([0.1769, 0.0751, 0.0900, 0.0569, 0.1253, 0.0593, 0.1896, 0.1021], device='cuda:1'), in_proj_covar=tensor([0.0234, 0.0272, 0.0300, 0.0358, 0.0246, 0.0245, 0.0264, 0.0372], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-18 20:38:35,941 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.4362, 4.9975, 5.4180, 4.7400, 5.0832, 4.8523, 5.4311, 5.0655], device='cuda:1'), covar=tensor([0.0257, 0.0388, 0.0273, 0.0256, 0.0425, 0.0325, 0.0234, 0.0297], device='cuda:1'), in_proj_covar=tensor([0.0272, 0.0278, 0.0301, 0.0273, 0.0272, 0.0270, 0.0245, 0.0221], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 20:38:36,524 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=309013.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:38:41,027 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.22 vs. limit=2.0 2023-05-18 20:38:44,754 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=309025.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:38:48,164 INFO [finetune.py:992] (1/2) Epoch 18, batch 1050, loss[loss=0.1461, simple_loss=0.2475, pruned_loss=0.02237, over 12148.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2529, pruned_loss=0.03653, over 2369330.02 frames. ], batch size: 34, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:38:51,138 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.4960, 2.3014, 3.8547, 4.4972, 4.0844, 4.3792, 4.0105, 3.1493], device='cuda:1'), covar=tensor([0.0055, 0.0609, 0.0136, 0.0051, 0.0120, 0.0117, 0.0116, 0.0446], device='cuda:1'), in_proj_covar=tensor([0.0094, 0.0127, 0.0108, 0.0082, 0.0108, 0.0122, 0.0106, 0.0144], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 20:39:02,074 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.4044, 3.6401, 3.8607, 4.1973, 2.6897, 3.5862, 2.5330, 3.7647], device='cuda:1'), covar=tensor([0.1588, 0.1038, 0.1202, 0.0898, 0.1424, 0.0809, 0.1968, 0.1162], device='cuda:1'), in_proj_covar=tensor([0.0235, 0.0273, 0.0301, 0.0360, 0.0247, 0.0246, 0.0265, 0.0374], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-18 20:39:05,465 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.0434, 4.7164, 4.8508, 4.9031, 4.6572, 5.0117, 4.8112, 2.7212], device='cuda:1'), covar=tensor([0.0092, 0.0063, 0.0083, 0.0061, 0.0052, 0.0084, 0.0080, 0.0771], device='cuda:1'), in_proj_covar=tensor([0.0071, 0.0080, 0.0085, 0.0075, 0.0061, 0.0095, 0.0083, 0.0100], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 20:39:23,863 INFO [finetune.py:992] (1/2) Epoch 18, batch 1100, loss[loss=0.1875, simple_loss=0.2825, pruned_loss=0.04626, over 12055.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2529, pruned_loss=0.03648, over 2375581.95 frames. ], batch size: 37, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:39:24,259 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=2.84 vs. limit=5.0 2023-05-18 20:39:29,465 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=309088.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:39:35,510 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.949e+02 2.720e+02 3.175e+02 3.829e+02 6.292e+02, threshold=6.350e+02, percent-clipped=0.0 2023-05-18 20:39:44,375 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.81 vs. limit=2.0 2023-05-18 20:39:58,583 INFO [finetune.py:992] (1/2) Epoch 18, batch 1150, loss[loss=0.1434, simple_loss=0.2326, pruned_loss=0.02706, over 11806.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2531, pruned_loss=0.03619, over 2377305.66 frames. ], batch size: 26, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:40:04,612 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.18 vs. limit=2.0 2023-05-18 20:40:33,669 INFO [finetune.py:992] (1/2) Epoch 18, batch 1200, loss[loss=0.1678, simple_loss=0.2577, pruned_loss=0.03899, over 11282.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2523, pruned_loss=0.03595, over 2371783.12 frames. ], batch size: 55, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:40:45,659 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.058e+02 2.688e+02 3.187e+02 3.612e+02 5.353e+02, threshold=6.374e+02, percent-clipped=0.0 2023-05-18 20:40:54,967 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=309208.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:41:09,972 INFO [finetune.py:992] (1/2) Epoch 18, batch 1250, loss[loss=0.1659, simple_loss=0.2634, pruned_loss=0.03418, over 12201.00 frames. ], tot_loss[loss=0.162, simple_loss=0.2524, pruned_loss=0.03583, over 2373156.40 frames. ], batch size: 35, lr: 3.26e-03, grad_scale: 8.0 2023-05-18 20:41:28,405 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=309256.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:41:45,067 INFO [finetune.py:992] (1/2) Epoch 18, batch 1300, loss[loss=0.1463, simple_loss=0.2453, pruned_loss=0.02371, over 12342.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2525, pruned_loss=0.03584, over 2362641.78 frames. ], batch size: 35, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:41:56,943 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.593e+02 2.360e+02 2.832e+02 3.330e+02 7.735e+02, threshold=5.664e+02, percent-clipped=3.0 2023-05-18 20:42:05,526 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=2.70 vs. limit=5.0 2023-05-18 20:42:16,300 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=309325.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:42:19,793 INFO [finetune.py:992] (1/2) Epoch 18, batch 1350, loss[loss=0.1552, simple_loss=0.2561, pruned_loss=0.02715, over 12267.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2527, pruned_loss=0.03574, over 2371365.49 frames. ], batch size: 37, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:42:50,542 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=309373.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:42:55,226 INFO [finetune.py:992] (1/2) Epoch 18, batch 1400, loss[loss=0.1699, simple_loss=0.2604, pruned_loss=0.03972, over 11653.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.2533, pruned_loss=0.03643, over 2371301.00 frames. ], batch size: 48, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:43:00,857 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=309388.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:43:06,917 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.037e+02 2.658e+02 3.169e+02 3.760e+02 1.278e+03, threshold=6.339e+02, percent-clipped=2.0 2023-05-18 20:43:07,271 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.16 vs. limit=2.0 2023-05-18 20:43:16,953 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=309411.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:43:30,047 INFO [finetune.py:992] (1/2) Epoch 18, batch 1450, loss[loss=0.1912, simple_loss=0.283, pruned_loss=0.04971, over 12120.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.2535, pruned_loss=0.03642, over 2373987.81 frames. ], batch size: 38, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:43:34,269 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=309436.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:43:34,583 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.17 vs. limit=2.0 2023-05-18 20:43:59,398 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=309472.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:44:04,856 INFO [finetune.py:992] (1/2) Epoch 18, batch 1500, loss[loss=0.1635, simple_loss=0.2577, pruned_loss=0.03465, over 12187.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.2533, pruned_loss=0.0367, over 2374621.45 frames. ], batch size: 35, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:44:17,150 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.737e+02 2.631e+02 3.171e+02 3.851e+02 8.126e+02, threshold=6.342e+02, percent-clipped=2.0 2023-05-18 20:44:40,497 INFO [finetune.py:992] (1/2) Epoch 18, batch 1550, loss[loss=0.1922, simple_loss=0.2806, pruned_loss=0.0519, over 12139.00 frames. ], tot_loss[loss=0.1636, simple_loss=0.2536, pruned_loss=0.03681, over 2379610.10 frames. ], batch size: 38, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:44:44,903 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.5566, 3.7304, 3.2900, 3.2956, 2.8533, 2.7825, 3.6477, 2.4192], device='cuda:1'), covar=tensor([0.0453, 0.0142, 0.0236, 0.0212, 0.0446, 0.0409, 0.0173, 0.0589], device='cuda:1'), in_proj_covar=tensor([0.0201, 0.0167, 0.0172, 0.0196, 0.0208, 0.0205, 0.0181, 0.0210], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 20:44:59,491 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.1835, 4.5906, 4.0219, 4.7897, 4.4282, 2.8715, 4.0966, 2.9679], device='cuda:1'), covar=tensor([0.0947, 0.0882, 0.1609, 0.0647, 0.1270, 0.1895, 0.1272, 0.3635], device='cuda:1'), in_proj_covar=tensor([0.0317, 0.0386, 0.0368, 0.0338, 0.0380, 0.0283, 0.0356, 0.0376], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 20:45:10,454 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=309573.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:45:14,845 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.0702, 4.4900, 3.8823, 4.6652, 4.2752, 2.6659, 3.9521, 2.8539], device='cuda:1'), covar=tensor([0.1014, 0.0826, 0.1613, 0.0710, 0.1343, 0.2108, 0.1396, 0.3854], device='cuda:1'), in_proj_covar=tensor([0.0317, 0.0386, 0.0367, 0.0338, 0.0379, 0.0283, 0.0355, 0.0376], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 20:45:15,292 INFO [finetune.py:992] (1/2) Epoch 18, batch 1600, loss[loss=0.1612, simple_loss=0.2614, pruned_loss=0.03049, over 12030.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.2529, pruned_loss=0.0363, over 2383312.97 frames. ], batch size: 40, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:45:24,453 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.6051, 2.7079, 3.8123, 4.6532, 3.9592, 4.6803, 4.1237, 3.5003], device='cuda:1'), covar=tensor([0.0048, 0.0436, 0.0153, 0.0044, 0.0173, 0.0072, 0.0120, 0.0357], device='cuda:1'), in_proj_covar=tensor([0.0094, 0.0127, 0.0108, 0.0082, 0.0108, 0.0121, 0.0105, 0.0143], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 20:45:27,055 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.732e+02 2.535e+02 2.970e+02 3.567e+02 5.666e+02, threshold=5.940e+02, percent-clipped=0.0 2023-05-18 20:45:45,090 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.4730, 3.6313, 3.7938, 4.1809, 2.6997, 3.5612, 2.4672, 3.8817], device='cuda:1'), covar=tensor([0.1560, 0.1132, 0.1406, 0.0938, 0.1520, 0.0860, 0.2161, 0.1168], device='cuda:1'), in_proj_covar=tensor([0.0234, 0.0274, 0.0303, 0.0363, 0.0248, 0.0247, 0.0265, 0.0376], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-18 20:45:50,312 INFO [finetune.py:992] (1/2) Epoch 18, batch 1650, loss[loss=0.1615, simple_loss=0.2472, pruned_loss=0.03794, over 12175.00 frames. ], tot_loss[loss=0.162, simple_loss=0.2522, pruned_loss=0.03586, over 2388854.72 frames. ], batch size: 31, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:45:51,924 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.4332, 2.5595, 3.0591, 4.3068, 2.2732, 4.2654, 4.4129, 4.4353], device='cuda:1'), covar=tensor([0.0130, 0.1275, 0.0614, 0.0120, 0.1414, 0.0311, 0.0138, 0.0105], device='cuda:1'), in_proj_covar=tensor([0.0126, 0.0207, 0.0185, 0.0124, 0.0191, 0.0183, 0.0182, 0.0128], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 20:45:52,661 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.4341, 3.2975, 4.8212, 2.5516, 2.7016, 3.5529, 3.0994, 3.6219], device='cuda:1'), covar=tensor([0.0501, 0.1248, 0.0360, 0.1362, 0.2144, 0.1687, 0.1482, 0.1286], device='cuda:1'), in_proj_covar=tensor([0.0239, 0.0241, 0.0260, 0.0187, 0.0243, 0.0299, 0.0230, 0.0272], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 20:45:53,326 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=309634.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:46:12,743 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.16 vs. limit=5.0 2023-05-18 20:46:26,015 INFO [finetune.py:992] (1/2) Epoch 18, batch 1700, loss[loss=0.1812, simple_loss=0.2824, pruned_loss=0.04001, over 12357.00 frames. ], tot_loss[loss=0.162, simple_loss=0.2525, pruned_loss=0.03578, over 2390327.17 frames. ], batch size: 35, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:46:37,506 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.745e+02 3.098e+02 3.777e+02 1.829e+03, threshold=6.196e+02, percent-clipped=5.0 2023-05-18 20:47:00,277 INFO [finetune.py:992] (1/2) Epoch 18, batch 1750, loss[loss=0.1502, simple_loss=0.2435, pruned_loss=0.02851, over 12293.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.253, pruned_loss=0.03605, over 2393190.95 frames. ], batch size: 34, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:47:25,824 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=309767.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:47:34,897 INFO [finetune.py:992] (1/2) Epoch 18, batch 1800, loss[loss=0.2306, simple_loss=0.309, pruned_loss=0.07616, over 8715.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2532, pruned_loss=0.03631, over 2387422.69 frames. ], batch size: 101, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:47:46,891 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.072e+02 2.723e+02 3.116e+02 3.628e+02 7.661e+02, threshold=6.232e+02, percent-clipped=3.0 2023-05-18 20:47:51,784 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.64 vs. limit=2.0 2023-05-18 20:48:10,630 INFO [finetune.py:992] (1/2) Epoch 18, batch 1850, loss[loss=0.1604, simple_loss=0.2576, pruned_loss=0.03162, over 12151.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.253, pruned_loss=0.03611, over 2392588.86 frames. ], batch size: 36, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:48:12,356 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.1128, 4.4655, 3.8794, 4.7638, 4.3510, 2.7440, 4.0678, 2.9005], device='cuda:1'), covar=tensor([0.0924, 0.0937, 0.1617, 0.0632, 0.1233, 0.1980, 0.1163, 0.3731], device='cuda:1'), in_proj_covar=tensor([0.0319, 0.0388, 0.0371, 0.0341, 0.0382, 0.0284, 0.0358, 0.0379], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 20:48:21,908 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=309846.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:48:28,320 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=309855.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:48:45,483 INFO [finetune.py:992] (1/2) Epoch 18, batch 1900, loss[loss=0.1516, simple_loss=0.2364, pruned_loss=0.03346, over 11824.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2525, pruned_loss=0.03603, over 2398332.74 frames. ], batch size: 26, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:48:57,072 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.15 vs. limit=2.0 2023-05-18 20:48:57,369 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.578e+02 3.098e+02 3.456e+02 8.293e+02, threshold=6.197e+02, percent-clipped=1.0 2023-05-18 20:49:04,609 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=309907.0, num_to_drop=1, layers_to_drop={2} 2023-05-18 20:49:10,875 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=309916.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:49:11,134 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.97 vs. limit=2.0 2023-05-18 20:49:19,598 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=309929.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:49:20,279 INFO [finetune.py:992] (1/2) Epoch 18, batch 1950, loss[loss=0.1738, simple_loss=0.2536, pruned_loss=0.04707, over 12181.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.2537, pruned_loss=0.03653, over 2391278.76 frames. ], batch size: 29, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:49:43,863 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.4239, 4.6500, 4.1274, 4.9489, 4.5716, 2.9993, 4.1646, 3.1151], device='cuda:1'), covar=tensor([0.0901, 0.0833, 0.1478, 0.0569, 0.1262, 0.1760, 0.1249, 0.3551], device='cuda:1'), in_proj_covar=tensor([0.0319, 0.0388, 0.0370, 0.0340, 0.0381, 0.0284, 0.0357, 0.0378], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 20:49:56,067 INFO [finetune.py:992] (1/2) Epoch 18, batch 2000, loss[loss=0.1668, simple_loss=0.2597, pruned_loss=0.03695, over 11125.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.2537, pruned_loss=0.03642, over 2385646.39 frames. ], batch size: 55, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:49:56,905 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=309981.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:50:08,034 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.820e+02 2.535e+02 2.884e+02 3.403e+02 2.001e+03, threshold=5.769e+02, percent-clipped=2.0 2023-05-18 20:50:33,271 INFO [finetune.py:992] (1/2) Epoch 18, batch 2050, loss[loss=0.2064, simple_loss=0.3024, pruned_loss=0.05514, over 8425.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2525, pruned_loss=0.03593, over 2391595.53 frames. ], batch size: 98, lr: 3.26e-03, grad_scale: 16.0 2023-05-18 20:50:41,728 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=310042.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:50:58,480 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=310067.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:50:58,569 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.3947, 2.4012, 3.6824, 4.4163, 3.8119, 4.3781, 3.8094, 3.1783], device='cuda:1'), covar=tensor([0.0053, 0.0413, 0.0157, 0.0057, 0.0140, 0.0088, 0.0146, 0.0373], device='cuda:1'), in_proj_covar=tensor([0.0094, 0.0127, 0.0108, 0.0082, 0.0109, 0.0121, 0.0106, 0.0143], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 20:51:07,309 INFO [finetune.py:992] (1/2) Epoch 18, batch 2100, loss[loss=0.1771, simple_loss=0.2677, pruned_loss=0.04323, over 12062.00 frames. ], tot_loss[loss=0.1633, simple_loss=0.2537, pruned_loss=0.03643, over 2385894.19 frames. ], batch size: 40, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 20:51:20,293 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.635e+02 3.197e+02 3.890e+02 6.389e+02, threshold=6.395e+02, percent-clipped=3.0 2023-05-18 20:51:32,961 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=310115.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:51:43,382 INFO [finetune.py:992] (1/2) Epoch 18, batch 2150, loss[loss=0.1343, simple_loss=0.2148, pruned_loss=0.02693, over 12266.00 frames. ], tot_loss[loss=0.163, simple_loss=0.253, pruned_loss=0.03652, over 2373698.25 frames. ], batch size: 28, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 20:52:18,104 INFO [finetune.py:992] (1/2) Epoch 18, batch 2200, loss[loss=0.1846, simple_loss=0.282, pruned_loss=0.04361, over 12270.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.2531, pruned_loss=0.03652, over 2377146.66 frames. ], batch size: 37, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 20:52:25,581 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2023-05-18 20:52:29,808 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.512e+02 2.961e+02 3.502e+02 5.832e+02, threshold=5.922e+02, percent-clipped=0.0 2023-05-18 20:52:33,726 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=310202.0, num_to_drop=1, layers_to_drop={2} 2023-05-18 20:52:39,866 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=310211.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:52:46,343 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.0170, 4.7257, 4.8612, 4.9302, 4.6772, 4.9621, 4.8699, 2.6881], device='cuda:1'), covar=tensor([0.0114, 0.0069, 0.0087, 0.0057, 0.0061, 0.0107, 0.0075, 0.0858], device='cuda:1'), in_proj_covar=tensor([0.0072, 0.0082, 0.0086, 0.0076, 0.0063, 0.0097, 0.0085, 0.0102], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 20:52:46,381 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.3530, 4.6951, 2.9852, 2.8966, 4.0783, 2.6941, 3.9717, 3.2126], device='cuda:1'), covar=tensor([0.0728, 0.0578, 0.1118, 0.1393, 0.0315, 0.1320, 0.0502, 0.0809], device='cuda:1'), in_proj_covar=tensor([0.0188, 0.0258, 0.0178, 0.0200, 0.0142, 0.0184, 0.0201, 0.0176], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 20:52:52,417 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=310229.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:52:53,008 INFO [finetune.py:992] (1/2) Epoch 18, batch 2250, loss[loss=0.17, simple_loss=0.2577, pruned_loss=0.04112, over 8122.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.2533, pruned_loss=0.0364, over 2373499.66 frames. ], batch size: 97, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 20:53:20,096 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.2754, 4.7404, 4.2505, 4.9821, 4.5220, 2.9250, 4.2237, 3.1440], device='cuda:1'), covar=tensor([0.0908, 0.0751, 0.1317, 0.0521, 0.1159, 0.1799, 0.1095, 0.3439], device='cuda:1'), in_proj_covar=tensor([0.0314, 0.0382, 0.0365, 0.0336, 0.0374, 0.0280, 0.0350, 0.0371], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 20:53:26,752 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=310277.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:53:28,803 INFO [finetune.py:992] (1/2) Epoch 18, batch 2300, loss[loss=0.1634, simple_loss=0.255, pruned_loss=0.03591, over 12104.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.2529, pruned_loss=0.0364, over 2382879.22 frames. ], batch size: 32, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 20:53:33,852 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=310287.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:53:40,754 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.798e+02 2.561e+02 3.053e+02 3.561e+02 7.205e+02, threshold=6.106e+02, percent-clipped=2.0 2023-05-18 20:53:45,899 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.4109, 2.3831, 3.7611, 4.5516, 3.9883, 4.4658, 3.9483, 3.1744], device='cuda:1'), covar=tensor([0.0058, 0.0468, 0.0151, 0.0035, 0.0129, 0.0089, 0.0139, 0.0401], device='cuda:1'), in_proj_covar=tensor([0.0094, 0.0127, 0.0109, 0.0083, 0.0109, 0.0121, 0.0106, 0.0144], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 20:54:03,551 INFO [finetune.py:992] (1/2) Epoch 18, batch 2350, loss[loss=0.1554, simple_loss=0.2496, pruned_loss=0.03062, over 12343.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2523, pruned_loss=0.03606, over 2385380.13 frames. ], batch size: 31, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 20:54:08,495 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=310337.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:54:16,242 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=310348.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:54:38,138 INFO [finetune.py:992] (1/2) Epoch 18, batch 2400, loss[loss=0.1862, simple_loss=0.2737, pruned_loss=0.04934, over 12040.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2524, pruned_loss=0.036, over 2387969.47 frames. ], batch size: 40, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 20:54:46,122 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0 2023-05-18 20:54:51,055 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.488e+02 2.575e+02 3.256e+02 3.752e+02 1.185e+03, threshold=6.512e+02, percent-clipped=4.0 2023-05-18 20:55:11,832 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.7573, 2.8404, 4.5522, 4.7176, 2.8756, 2.7068, 3.0285, 2.2677], device='cuda:1'), covar=tensor([0.1729, 0.3054, 0.0495, 0.0484, 0.1410, 0.2605, 0.2849, 0.4067], device='cuda:1'), in_proj_covar=tensor([0.0309, 0.0396, 0.0280, 0.0307, 0.0280, 0.0325, 0.0406, 0.0385], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-18 20:55:14,433 INFO [finetune.py:992] (1/2) Epoch 18, batch 2450, loss[loss=0.1483, simple_loss=0.2408, pruned_loss=0.02788, over 12190.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2531, pruned_loss=0.03603, over 2390202.16 frames. ], batch size: 31, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 20:55:41,213 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.37 vs. limit=2.0 2023-05-18 20:55:49,099 INFO [finetune.py:992] (1/2) Epoch 18, batch 2500, loss[loss=0.1466, simple_loss=0.2325, pruned_loss=0.03031, over 12419.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2536, pruned_loss=0.03607, over 2396668.36 frames. ], batch size: 32, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 20:56:00,812 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.642e+02 2.609e+02 3.033e+02 3.813e+02 1.184e+03, threshold=6.066e+02, percent-clipped=3.0 2023-05-18 20:56:04,493 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=310502.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:56:10,841 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=310511.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:56:24,094 INFO [finetune.py:992] (1/2) Epoch 18, batch 2550, loss[loss=0.1398, simple_loss=0.2219, pruned_loss=0.02886, over 12190.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2532, pruned_loss=0.03607, over 2393568.19 frames. ], batch size: 29, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 20:56:39,238 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=310550.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:56:39,623 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.88 vs. limit=2.0 2023-05-18 20:56:45,458 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=310559.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:56:47,232 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2023-05-18 20:57:00,026 INFO [finetune.py:992] (1/2) Epoch 18, batch 2600, loss[loss=0.131, simple_loss=0.2099, pruned_loss=0.0261, over 12277.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.253, pruned_loss=0.03626, over 2388452.18 frames. ], batch size: 28, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 20:57:11,844 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.750e+02 2.539e+02 2.969e+02 3.489e+02 1.170e+03, threshold=5.939e+02, percent-clipped=2.0 2023-05-18 20:57:31,039 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.2602, 6.0263, 5.6335, 5.6139, 6.1114, 5.2554, 5.6180, 5.6561], device='cuda:1'), covar=tensor([0.1481, 0.0811, 0.1096, 0.1808, 0.1028, 0.2473, 0.1656, 0.1095], device='cuda:1'), in_proj_covar=tensor([0.0374, 0.0517, 0.0417, 0.0467, 0.0478, 0.0460, 0.0421, 0.0400], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 20:57:35,064 INFO [finetune.py:992] (1/2) Epoch 18, batch 2650, loss[loss=0.1413, simple_loss=0.2281, pruned_loss=0.02727, over 12023.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2525, pruned_loss=0.03618, over 2389767.94 frames. ], batch size: 31, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 20:57:40,077 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=310637.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:57:44,156 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=310643.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:57:48,525 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=310649.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:58:08,085 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.3840, 2.8160, 3.9136, 3.2834, 3.6966, 3.4793, 2.9529, 3.8257], device='cuda:1'), covar=tensor([0.0146, 0.0408, 0.0168, 0.0293, 0.0204, 0.0209, 0.0355, 0.0144], device='cuda:1'), in_proj_covar=tensor([0.0190, 0.0216, 0.0203, 0.0198, 0.0231, 0.0177, 0.0209, 0.0201], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 20:58:09,950 INFO [finetune.py:992] (1/2) Epoch 18, batch 2700, loss[loss=0.1734, simple_loss=0.2642, pruned_loss=0.04129, over 12015.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.2521, pruned_loss=0.03582, over 2389281.91 frames. ], batch size: 40, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 20:58:14,147 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=310685.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:58:15,693 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.5207, 2.5303, 3.1727, 4.3655, 2.4899, 4.3763, 4.5483, 4.5418], device='cuda:1'), covar=tensor([0.0160, 0.1344, 0.0566, 0.0170, 0.1336, 0.0223, 0.0140, 0.0127], device='cuda:1'), in_proj_covar=tensor([0.0125, 0.0207, 0.0185, 0.0124, 0.0190, 0.0183, 0.0182, 0.0127], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 20:58:22,056 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.573e+02 2.755e+02 3.171e+02 3.711e+02 7.939e+02, threshold=6.342e+02, percent-clipped=1.0 2023-05-18 20:58:31,461 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=310710.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:58:45,104 INFO [finetune.py:992] (1/2) Epoch 18, batch 2750, loss[loss=0.1356, simple_loss=0.2216, pruned_loss=0.02479, over 12200.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2531, pruned_loss=0.03613, over 2379610.63 frames. ], batch size: 29, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 20:59:11,895 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.1445, 4.5311, 2.7350, 2.3718, 3.9483, 2.5696, 3.8467, 3.1180], device='cuda:1'), covar=tensor([0.0856, 0.0655, 0.1287, 0.1743, 0.0343, 0.1364, 0.0497, 0.0852], device='cuda:1'), in_proj_covar=tensor([0.0190, 0.0259, 0.0180, 0.0201, 0.0143, 0.0185, 0.0203, 0.0177], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 20:59:15,842 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.2082, 5.0990, 5.0409, 5.0784, 4.7227, 5.2448, 5.1560, 5.3507], device='cuda:1'), covar=tensor([0.0295, 0.0147, 0.0176, 0.0348, 0.0723, 0.0274, 0.0146, 0.0178], device='cuda:1'), in_proj_covar=tensor([0.0205, 0.0204, 0.0198, 0.0255, 0.0251, 0.0231, 0.0184, 0.0238], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:1') 2023-05-18 20:59:19,901 INFO [finetune.py:992] (1/2) Epoch 18, batch 2800, loss[loss=0.1294, simple_loss=0.2154, pruned_loss=0.02166, over 12280.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2526, pruned_loss=0.03611, over 2374496.04 frames. ], batch size: 28, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 20:59:29,750 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.9722, 4.8464, 4.7925, 4.8807, 4.4649, 5.0370, 4.9487, 5.1440], device='cuda:1'), covar=tensor([0.0313, 0.0177, 0.0195, 0.0339, 0.0868, 0.0301, 0.0176, 0.0203], device='cuda:1'), in_proj_covar=tensor([0.0205, 0.0204, 0.0198, 0.0254, 0.0251, 0.0231, 0.0184, 0.0238], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:1') 2023-05-18 20:59:31,744 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.653e+02 3.113e+02 3.609e+02 5.905e+02, threshold=6.226e+02, percent-clipped=0.0 2023-05-18 20:59:34,322 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.9049, 5.9100, 5.6402, 5.2113, 5.1559, 5.8253, 5.3936, 5.2031], device='cuda:1'), covar=tensor([0.0802, 0.0984, 0.0825, 0.1838, 0.0848, 0.0776, 0.1649, 0.1290], device='cuda:1'), in_proj_covar=tensor([0.0650, 0.0583, 0.0535, 0.0658, 0.0436, 0.0754, 0.0811, 0.0586], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0004, 0.0002], device='cuda:1') 2023-05-18 20:59:36,491 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=310803.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 20:59:44,767 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.4582, 5.2749, 5.3747, 5.4356, 5.0689, 5.0714, 4.7953, 5.3484], device='cuda:1'), covar=tensor([0.0665, 0.0596, 0.0885, 0.0578, 0.1971, 0.1379, 0.0555, 0.1170], device='cuda:1'), in_proj_covar=tensor([0.0570, 0.0736, 0.0653, 0.0656, 0.0883, 0.0785, 0.0587, 0.0507], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0003, 0.0003], device='cuda:1') 2023-05-18 20:59:56,135 INFO [finetune.py:992] (1/2) Epoch 18, batch 2850, loss[loss=0.1483, simple_loss=0.2412, pruned_loss=0.02767, over 12124.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.2532, pruned_loss=0.03647, over 2370668.90 frames. ], batch size: 33, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 21:00:19,944 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=310864.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:00:30,600 INFO [finetune.py:992] (1/2) Epoch 18, batch 2900, loss[loss=0.1806, simple_loss=0.279, pruned_loss=0.04113, over 12013.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2532, pruned_loss=0.03642, over 2371223.96 frames. ], batch size: 42, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 21:00:42,443 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.955e+02 2.591e+02 3.025e+02 3.364e+02 5.558e+02, threshold=6.049e+02, percent-clipped=0.0 2023-05-18 21:00:46,227 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.0861, 4.9675, 4.9288, 4.9606, 4.6232, 5.1361, 5.1113, 5.3008], device='cuda:1'), covar=tensor([0.0264, 0.0161, 0.0188, 0.0358, 0.0706, 0.0336, 0.0164, 0.0177], device='cuda:1'), in_proj_covar=tensor([0.0204, 0.0203, 0.0197, 0.0254, 0.0249, 0.0230, 0.0183, 0.0237], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:1') 2023-05-18 21:01:04,733 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.8505, 3.2529, 2.4943, 2.1512, 2.9706, 2.3419, 3.0983, 2.6670], device='cuda:1'), covar=tensor([0.0627, 0.0634, 0.0875, 0.1374, 0.0288, 0.1108, 0.0570, 0.0754], device='cuda:1'), in_proj_covar=tensor([0.0190, 0.0260, 0.0179, 0.0202, 0.0144, 0.0186, 0.0204, 0.0177], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 21:01:05,199 INFO [finetune.py:992] (1/2) Epoch 18, batch 2950, loss[loss=0.231, simple_loss=0.3104, pruned_loss=0.07579, over 7904.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2524, pruned_loss=0.03612, over 2375208.88 frames. ], batch size: 98, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 21:01:14,314 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=310943.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:01:17,485 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.31 vs. limit=5.0 2023-05-18 21:01:40,649 INFO [finetune.py:992] (1/2) Epoch 18, batch 3000, loss[loss=0.1482, simple_loss=0.2409, pruned_loss=0.02776, over 12023.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2526, pruned_loss=0.03626, over 2375826.62 frames. ], batch size: 31, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 21:01:40,650 INFO [finetune.py:1017] (1/2) Computing validation loss 2023-05-18 21:01:58,644 INFO [finetune.py:1026] (1/2) Epoch 18, validation: loss=0.3133, simple_loss=0.3898, pruned_loss=0.1184, over 1020973.00 frames. 2023-05-18 21:01:58,645 INFO [finetune.py:1027] (1/2) Maximum memory allocated so far is 12292MB 2023-05-18 21:02:06,226 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=310991.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:02:10,466 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.845e+02 2.831e+02 3.206e+02 3.892e+02 6.977e+02, threshold=6.412e+02, percent-clipped=3.0 2023-05-18 21:02:16,463 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=311005.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:02:16,710 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.25 vs. limit=2.0 2023-05-18 21:02:32,481 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=311028.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:02:33,745 INFO [finetune.py:992] (1/2) Epoch 18, batch 3050, loss[loss=0.1274, simple_loss=0.2039, pruned_loss=0.02549, over 12028.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2513, pruned_loss=0.03577, over 2385372.86 frames. ], batch size: 28, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 21:02:58,954 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=2.64 vs. limit=5.0 2023-05-18 21:03:09,063 INFO [finetune.py:992] (1/2) Epoch 18, batch 3100, loss[loss=0.1672, simple_loss=0.2629, pruned_loss=0.03575, over 12350.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2521, pruned_loss=0.03646, over 2376904.26 frames. ], batch size: 35, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 21:03:16,011 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=311089.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:03:21,383 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.869e+02 2.622e+02 2.881e+02 3.500e+02 8.081e+02, threshold=5.763e+02, percent-clipped=2.0 2023-05-18 21:03:44,201 INFO [finetune.py:992] (1/2) Epoch 18, batch 3150, loss[loss=0.1524, simple_loss=0.2362, pruned_loss=0.03437, over 12182.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2519, pruned_loss=0.03653, over 2366013.16 frames. ], batch size: 29, lr: 3.25e-03, grad_scale: 16.0 2023-05-18 21:03:48,430 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.3780, 2.6537, 3.9654, 3.2363, 3.7629, 3.4609, 2.9100, 3.8801], device='cuda:1'), covar=tensor([0.0139, 0.0403, 0.0138, 0.0274, 0.0150, 0.0194, 0.0392, 0.0130], device='cuda:1'), in_proj_covar=tensor([0.0191, 0.0218, 0.0205, 0.0199, 0.0232, 0.0178, 0.0210, 0.0202], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 21:04:04,461 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=311159.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:04:08,715 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.7213, 3.4560, 3.5593, 3.7423, 3.4403, 3.9042, 3.8248, 3.9359], device='cuda:1'), covar=tensor([0.0523, 0.0325, 0.0297, 0.0510, 0.0721, 0.0544, 0.0308, 0.0294], device='cuda:1'), in_proj_covar=tensor([0.0200, 0.0200, 0.0194, 0.0250, 0.0245, 0.0225, 0.0180, 0.0233], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:1') 2023-05-18 21:04:10,835 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.9938, 4.6651, 4.8069, 4.9429, 4.6518, 4.8903, 4.7975, 2.7062], device='cuda:1'), covar=tensor([0.0116, 0.0073, 0.0086, 0.0056, 0.0051, 0.0098, 0.0100, 0.0838], device='cuda:1'), in_proj_covar=tensor([0.0072, 0.0081, 0.0085, 0.0076, 0.0063, 0.0097, 0.0085, 0.0101], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 21:04:18,881 INFO [finetune.py:992] (1/2) Epoch 18, batch 3200, loss[loss=0.1597, simple_loss=0.2582, pruned_loss=0.03065, over 11662.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.2515, pruned_loss=0.03617, over 2366654.35 frames. ], batch size: 48, lr: 3.25e-03, grad_scale: 8.0 2023-05-18 21:04:29,802 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.89 vs. limit=2.0 2023-05-18 21:04:31,362 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.730e+02 2.620e+02 3.041e+02 3.528e+02 9.797e+02, threshold=6.082e+02, percent-clipped=4.0 2023-05-18 21:04:45,697 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.24 vs. limit=2.0 2023-05-18 21:04:54,805 INFO [finetune.py:992] (1/2) Epoch 18, batch 3250, loss[loss=0.1612, simple_loss=0.259, pruned_loss=0.03171, over 12202.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2518, pruned_loss=0.03633, over 2367966.97 frames. ], batch size: 35, lr: 3.25e-03, grad_scale: 8.0 2023-05-18 21:05:29,466 INFO [finetune.py:992] (1/2) Epoch 18, batch 3300, loss[loss=0.155, simple_loss=0.2571, pruned_loss=0.02641, over 12345.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.2515, pruned_loss=0.03617, over 2371198.41 frames. ], batch size: 36, lr: 3.25e-03, grad_scale: 8.0 2023-05-18 21:05:40,859 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.2270, 2.4548, 3.0816, 4.1312, 2.1728, 4.0907, 4.2300, 4.2696], device='cuda:1'), covar=tensor([0.0173, 0.1271, 0.0552, 0.0169, 0.1515, 0.0319, 0.0166, 0.0138], device='cuda:1'), in_proj_covar=tensor([0.0128, 0.0209, 0.0187, 0.0125, 0.0193, 0.0185, 0.0184, 0.0129], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 21:05:42,079 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.810e+02 2.684e+02 3.108e+02 3.750e+02 5.472e+02, threshold=6.215e+02, percent-clipped=0.0 2023-05-18 21:05:47,196 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=311305.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:06:04,699 INFO [finetune.py:992] (1/2) Epoch 18, batch 3350, loss[loss=0.1565, simple_loss=0.2484, pruned_loss=0.0323, over 12336.00 frames. ], tot_loss[loss=0.1615, simple_loss=0.2508, pruned_loss=0.03612, over 2368569.42 frames. ], batch size: 35, lr: 3.25e-03, grad_scale: 8.0 2023-05-18 21:06:06,291 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=311332.0, num_to_drop=1, layers_to_drop={0} 2023-05-18 21:06:20,396 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=311353.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:06:40,384 INFO [finetune.py:992] (1/2) Epoch 18, batch 3400, loss[loss=0.1445, simple_loss=0.2276, pruned_loss=0.03073, over 12170.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2516, pruned_loss=0.03651, over 2372120.66 frames. ], batch size: 29, lr: 3.25e-03, grad_scale: 8.0 2023-05-18 21:06:43,293 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=311384.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:06:48,279 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.7266, 3.7353, 3.3011, 3.2667, 3.0451, 3.0003, 3.7483, 2.4944], device='cuda:1'), covar=tensor([0.0408, 0.0141, 0.0256, 0.0245, 0.0432, 0.0398, 0.0182, 0.0527], device='cuda:1'), in_proj_covar=tensor([0.0200, 0.0170, 0.0172, 0.0198, 0.0208, 0.0207, 0.0182, 0.0211], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 21:06:49,624 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=311393.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 21:06:52,753 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.523e+02 3.046e+02 3.559e+02 6.317e+02, threshold=6.092e+02, percent-clipped=2.0 2023-05-18 21:07:03,718 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.6375, 4.2485, 4.4731, 4.7857, 3.3219, 4.1966, 2.9451, 4.3977], device='cuda:1'), covar=tensor([0.1609, 0.0785, 0.0742, 0.0496, 0.1214, 0.0600, 0.1847, 0.1268], device='cuda:1'), in_proj_covar=tensor([0.0233, 0.0273, 0.0303, 0.0362, 0.0248, 0.0247, 0.0264, 0.0376], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-18 21:07:15,199 INFO [finetune.py:992] (1/2) Epoch 18, batch 3450, loss[loss=0.1595, simple_loss=0.2577, pruned_loss=0.03064, over 12187.00 frames. ], tot_loss[loss=0.162, simple_loss=0.2516, pruned_loss=0.03621, over 2373503.73 frames. ], batch size: 35, lr: 3.25e-03, grad_scale: 8.0 2023-05-18 21:07:20,935 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.2594, 6.0966, 5.7329, 5.6647, 6.1805, 5.4594, 5.6837, 5.6896], device='cuda:1'), covar=tensor([0.1463, 0.0821, 0.1057, 0.1772, 0.0861, 0.2317, 0.1757, 0.1165], device='cuda:1'), in_proj_covar=tensor([0.0370, 0.0516, 0.0415, 0.0461, 0.0475, 0.0458, 0.0417, 0.0398], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 21:07:35,795 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=311459.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:07:35,851 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.4974, 2.6638, 3.1443, 4.3740, 2.5217, 4.2946, 4.5368, 4.5346], device='cuda:1'), covar=tensor([0.0155, 0.1236, 0.0590, 0.0172, 0.1308, 0.0288, 0.0185, 0.0131], device='cuda:1'), in_proj_covar=tensor([0.0127, 0.0209, 0.0187, 0.0125, 0.0193, 0.0185, 0.0184, 0.0129], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 21:07:50,028 INFO [finetune.py:992] (1/2) Epoch 18, batch 3500, loss[loss=0.1535, simple_loss=0.2344, pruned_loss=0.03628, over 12142.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.2515, pruned_loss=0.03588, over 2377399.39 frames. ], batch size: 30, lr: 3.25e-03, grad_scale: 8.0 2023-05-18 21:07:56,604 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=2.85 vs. limit=5.0 2023-05-18 21:08:02,427 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.944e+02 2.630e+02 3.039e+02 3.536e+02 5.562e+02, threshold=6.078e+02, percent-clipped=0.0 2023-05-18 21:08:08,699 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=311507.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:08:25,653 INFO [finetune.py:992] (1/2) Epoch 18, batch 3550, loss[loss=0.1364, simple_loss=0.2118, pruned_loss=0.03053, over 12161.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.2517, pruned_loss=0.03581, over 2375896.04 frames. ], batch size: 29, lr: 3.25e-03, grad_scale: 8.0 2023-05-18 21:08:36,812 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.2663, 5.0509, 5.1907, 5.2389, 4.8311, 4.9197, 4.6413, 5.1640], device='cuda:1'), covar=tensor([0.0798, 0.0670, 0.0910, 0.0635, 0.1965, 0.1403, 0.0552, 0.1070], device='cuda:1'), in_proj_covar=tensor([0.0567, 0.0737, 0.0649, 0.0657, 0.0877, 0.0781, 0.0583, 0.0503], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0003, 0.0003], device='cuda:1') 2023-05-18 21:08:41,489 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.4153, 2.5115, 3.7487, 4.4592, 3.9274, 4.4453, 3.8904, 3.1145], device='cuda:1'), covar=tensor([0.0041, 0.0414, 0.0139, 0.0037, 0.0115, 0.0074, 0.0119, 0.0376], device='cuda:1'), in_proj_covar=tensor([0.0093, 0.0126, 0.0108, 0.0082, 0.0109, 0.0120, 0.0105, 0.0143], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 21:08:47,486 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-05-18 21:08:51,221 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=311567.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:08:59,993 INFO [finetune.py:992] (1/2) Epoch 18, batch 3600, loss[loss=0.1737, simple_loss=0.2629, pruned_loss=0.04224, over 11767.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2525, pruned_loss=0.03613, over 2381646.02 frames. ], batch size: 44, lr: 3.25e-03, grad_scale: 8.0 2023-05-18 21:09:12,579 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.992e+02 2.695e+02 3.162e+02 3.738e+02 6.009e+02, threshold=6.324e+02, percent-clipped=0.0 2023-05-18 21:09:34,225 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=311628.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:09:35,386 INFO [finetune.py:992] (1/2) Epoch 18, batch 3650, loss[loss=0.1399, simple_loss=0.2316, pruned_loss=0.02404, over 12266.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.2518, pruned_loss=0.03596, over 2376224.50 frames. ], batch size: 28, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:09:45,749 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.64 vs. limit=2.0 2023-05-18 21:09:51,897 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.2059, 5.0831, 5.1760, 5.2245, 4.8636, 4.9097, 4.6659, 5.1377], device='cuda:1'), covar=tensor([0.0846, 0.0645, 0.0947, 0.0614, 0.1827, 0.1490, 0.0564, 0.1057], device='cuda:1'), in_proj_covar=tensor([0.0569, 0.0740, 0.0654, 0.0661, 0.0884, 0.0786, 0.0588, 0.0508], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0003, 0.0003], device='cuda:1') 2023-05-18 21:10:11,025 INFO [finetune.py:992] (1/2) Epoch 18, batch 3700, loss[loss=0.1799, simple_loss=0.2778, pruned_loss=0.04096, over 12042.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.2516, pruned_loss=0.03615, over 2380038.04 frames. ], batch size: 40, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:10:13,799 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=311684.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:10:14,566 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=311685.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:10:16,487 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=311688.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 21:10:18,122 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.13 vs. limit=2.0 2023-05-18 21:10:23,221 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.626e+02 3.034e+02 3.864e+02 3.281e+03, threshold=6.068e+02, percent-clipped=3.0 2023-05-18 21:10:25,940 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.37 vs. limit=2.0 2023-05-18 21:10:39,408 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.1627, 4.6916, 5.1582, 4.4536, 4.8304, 4.6012, 5.1790, 4.8195], device='cuda:1'), covar=tensor([0.0289, 0.0445, 0.0307, 0.0308, 0.0429, 0.0373, 0.0213, 0.0392], device='cuda:1'), in_proj_covar=tensor([0.0281, 0.0287, 0.0309, 0.0281, 0.0281, 0.0278, 0.0252, 0.0227], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 21:10:43,091 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=311726.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:10:45,593 INFO [finetune.py:992] (1/2) Epoch 18, batch 3750, loss[loss=0.1704, simple_loss=0.2503, pruned_loss=0.04525, over 8706.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2526, pruned_loss=0.03656, over 2368402.71 frames. ], batch size: 98, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:10:46,963 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=311732.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:10:56,724 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=311746.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:11:19,900 INFO [finetune.py:992] (1/2) Epoch 18, batch 3800, loss[loss=0.1724, simple_loss=0.2795, pruned_loss=0.03266, over 12155.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2522, pruned_loss=0.03649, over 2366402.68 frames. ], batch size: 34, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:11:25,088 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=311787.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:11:32,676 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.939e+02 2.626e+02 3.008e+02 3.427e+02 6.541e+02, threshold=6.016e+02, percent-clipped=1.0 2023-05-18 21:11:56,539 INFO [finetune.py:992] (1/2) Epoch 18, batch 3850, loss[loss=0.1924, simple_loss=0.2873, pruned_loss=0.04878, over 12301.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2519, pruned_loss=0.03639, over 2376555.81 frames. ], batch size: 34, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:12:13,654 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=311854.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:12:31,632 INFO [finetune.py:992] (1/2) Epoch 18, batch 3900, loss[loss=0.2109, simple_loss=0.2838, pruned_loss=0.06898, over 8101.00 frames. ], tot_loss[loss=0.1635, simple_loss=0.253, pruned_loss=0.03697, over 2372312.21 frames. ], batch size: 98, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:12:44,049 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.753e+02 2.717e+02 3.082e+02 3.749e+02 5.734e+02, threshold=6.165e+02, percent-clipped=0.0 2023-05-18 21:12:56,283 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=311915.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:13:01,704 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=311923.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:13:06,272 INFO [finetune.py:992] (1/2) Epoch 18, batch 3950, loss[loss=0.1534, simple_loss=0.2396, pruned_loss=0.03357, over 12152.00 frames. ], tot_loss[loss=0.1633, simple_loss=0.2528, pruned_loss=0.03695, over 2382382.19 frames. ], batch size: 30, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:13:41,836 INFO [finetune.py:992] (1/2) Epoch 18, batch 4000, loss[loss=0.1622, simple_loss=0.2573, pruned_loss=0.03352, over 11657.00 frames. ], tot_loss[loss=0.1638, simple_loss=0.2535, pruned_loss=0.03705, over 2377517.28 frames. ], batch size: 48, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:13:47,350 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=311988.0, num_to_drop=1, layers_to_drop={0} 2023-05-18 21:13:54,050 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.164e+02 2.734e+02 3.153e+02 3.912e+02 1.026e+03, threshold=6.306e+02, percent-clipped=4.0 2023-05-18 21:14:19,521 INFO [finetune.py:992] (1/2) Epoch 18, batch 4050, loss[loss=0.1362, simple_loss=0.2232, pruned_loss=0.02459, over 12339.00 frames. ], tot_loss[loss=0.1635, simple_loss=0.2532, pruned_loss=0.03693, over 2371560.00 frames. ], batch size: 31, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:14:23,840 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=312036.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 21:14:27,423 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=312041.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:14:54,377 INFO [finetune.py:992] (1/2) Epoch 18, batch 4100, loss[loss=0.1651, simple_loss=0.2563, pruned_loss=0.03696, over 12318.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2518, pruned_loss=0.03653, over 2370270.15 frames. ], batch size: 34, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:14:55,834 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=312082.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:15:06,868 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.043e+02 2.580e+02 3.050e+02 3.639e+02 7.290e+02, threshold=6.100e+02, percent-clipped=1.0 2023-05-18 21:15:26,918 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.4302, 3.9897, 4.1093, 4.4039, 3.1440, 3.9217, 2.7336, 3.9635], device='cuda:1'), covar=tensor([0.1631, 0.0820, 0.0864, 0.0782, 0.1282, 0.0727, 0.1969, 0.1138], device='cuda:1'), in_proj_covar=tensor([0.0234, 0.0276, 0.0306, 0.0367, 0.0249, 0.0249, 0.0267, 0.0379], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-18 21:15:30,307 INFO [finetune.py:992] (1/2) Epoch 18, batch 4150, loss[loss=0.1747, simple_loss=0.2663, pruned_loss=0.04156, over 11777.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2522, pruned_loss=0.03632, over 2370509.02 frames. ], batch size: 44, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:15:33,374 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.6072, 3.0778, 3.8455, 4.6076, 4.0311, 4.7494, 4.0076, 3.4901], device='cuda:1'), covar=tensor([0.0043, 0.0331, 0.0137, 0.0046, 0.0129, 0.0056, 0.0123, 0.0321], device='cuda:1'), in_proj_covar=tensor([0.0093, 0.0125, 0.0107, 0.0081, 0.0107, 0.0119, 0.0104, 0.0141], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 21:15:51,065 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.0577, 2.1436, 2.9542, 2.9299, 2.9936, 3.0989, 2.9082, 2.5036], device='cuda:1'), covar=tensor([0.0101, 0.0425, 0.0186, 0.0106, 0.0168, 0.0114, 0.0161, 0.0381], device='cuda:1'), in_proj_covar=tensor([0.0093, 0.0125, 0.0107, 0.0081, 0.0108, 0.0119, 0.0105, 0.0141], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 21:16:04,694 INFO [finetune.py:992] (1/2) Epoch 18, batch 4200, loss[loss=0.1316, simple_loss=0.2146, pruned_loss=0.02428, over 12189.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2517, pruned_loss=0.03624, over 2365422.61 frames. ], batch size: 31, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:16:17,284 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.655e+02 3.216e+02 3.929e+02 7.949e+02, threshold=6.433e+02, percent-clipped=1.0 2023-05-18 21:16:19,070 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.3399, 5.0581, 5.1790, 5.2070, 5.0450, 5.2973, 5.2314, 2.9669], device='cuda:1'), covar=tensor([0.0084, 0.0061, 0.0072, 0.0053, 0.0045, 0.0094, 0.0068, 0.0740], device='cuda:1'), in_proj_covar=tensor([0.0073, 0.0083, 0.0087, 0.0077, 0.0063, 0.0098, 0.0086, 0.0102], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 21:16:24,042 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.5207, 4.8052, 3.1419, 2.7550, 4.1987, 2.8022, 4.1361, 3.6020], device='cuda:1'), covar=tensor([0.0611, 0.0520, 0.0903, 0.1433, 0.0292, 0.1252, 0.0414, 0.0655], device='cuda:1'), in_proj_covar=tensor([0.0191, 0.0261, 0.0180, 0.0202, 0.0144, 0.0186, 0.0203, 0.0176], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 21:16:25,927 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=312210.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:16:34,882 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=312223.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:16:37,976 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.55 vs. limit=5.0 2023-05-18 21:16:39,706 INFO [finetune.py:992] (1/2) Epoch 18, batch 4250, loss[loss=0.1561, simple_loss=0.2517, pruned_loss=0.03025, over 12342.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2522, pruned_loss=0.03625, over 2363366.27 frames. ], batch size: 36, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:16:44,095 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.4159, 4.7958, 2.9823, 2.8843, 4.0479, 2.7237, 4.1319, 3.5433], device='cuda:1'), covar=tensor([0.0720, 0.0554, 0.1218, 0.1466, 0.0351, 0.1436, 0.0482, 0.0733], device='cuda:1'), in_proj_covar=tensor([0.0190, 0.0261, 0.0180, 0.0202, 0.0143, 0.0185, 0.0202, 0.0176], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 21:16:48,147 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.12 vs. limit=2.0 2023-05-18 21:17:09,260 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=312271.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:17:15,232 INFO [finetune.py:992] (1/2) Epoch 18, batch 4300, loss[loss=0.162, simple_loss=0.2517, pruned_loss=0.03614, over 11785.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2522, pruned_loss=0.03624, over 2360158.94 frames. ], batch size: 26, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:17:15,394 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.5627, 5.0371, 5.5158, 4.8022, 5.1166, 4.9153, 5.5741, 5.1850], device='cuda:1'), covar=tensor([0.0277, 0.0437, 0.0292, 0.0265, 0.0474, 0.0349, 0.0209, 0.0275], device='cuda:1'), in_proj_covar=tensor([0.0280, 0.0286, 0.0309, 0.0280, 0.0280, 0.0277, 0.0251, 0.0226], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 21:17:18,221 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.4340, 5.2604, 5.3785, 5.4213, 5.0720, 5.0921, 4.8871, 5.3351], device='cuda:1'), covar=tensor([0.0789, 0.0656, 0.0860, 0.0661, 0.1995, 0.1503, 0.0561, 0.1136], device='cuda:1'), in_proj_covar=tensor([0.0571, 0.0743, 0.0654, 0.0664, 0.0884, 0.0785, 0.0588, 0.0510], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0003, 0.0003], device='cuda:1') 2023-05-18 21:17:27,617 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.884e+02 2.814e+02 3.252e+02 3.764e+02 1.098e+03, threshold=6.505e+02, percent-clipped=2.0 2023-05-18 21:17:49,329 INFO [finetune.py:992] (1/2) Epoch 18, batch 4350, loss[loss=0.1806, simple_loss=0.2666, pruned_loss=0.04731, over 11910.00 frames. ], tot_loss[loss=0.1633, simple_loss=0.2529, pruned_loss=0.03683, over 2351534.87 frames. ], batch size: 44, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:17:57,264 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=312341.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:18:21,687 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.4117, 4.7854, 2.9527, 2.8111, 4.0560, 2.9687, 4.0522, 3.6357], device='cuda:1'), covar=tensor([0.0671, 0.0493, 0.1076, 0.1386, 0.0332, 0.1102, 0.0482, 0.0627], device='cuda:1'), in_proj_covar=tensor([0.0190, 0.0260, 0.0179, 0.0202, 0.0144, 0.0185, 0.0202, 0.0175], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 21:18:24,913 INFO [finetune.py:992] (1/2) Epoch 18, batch 4400, loss[loss=0.1557, simple_loss=0.2474, pruned_loss=0.03195, over 12245.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2523, pruned_loss=0.03644, over 2350120.36 frames. ], batch size: 32, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:18:26,427 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=312382.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:18:31,791 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=312389.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:18:37,865 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.785e+02 2.591e+02 3.116e+02 3.636e+02 1.408e+03, threshold=6.232e+02, percent-clipped=3.0 2023-05-18 21:18:47,698 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.12 vs. limit=2.0 2023-05-18 21:19:00,511 INFO [finetune.py:992] (1/2) Epoch 18, batch 4450, loss[loss=0.1527, simple_loss=0.2457, pruned_loss=0.02979, over 12286.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2521, pruned_loss=0.0363, over 2362061.17 frames. ], batch size: 34, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:19:00,567 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=312430.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:19:00,653 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.6415, 5.4410, 5.5482, 5.5892, 5.2100, 5.2700, 5.0698, 5.5192], device='cuda:1'), covar=tensor([0.0641, 0.0584, 0.0742, 0.0559, 0.1847, 0.1300, 0.0500, 0.1085], device='cuda:1'), in_proj_covar=tensor([0.0568, 0.0739, 0.0649, 0.0661, 0.0879, 0.0784, 0.0587, 0.0507], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0003, 0.0003], device='cuda:1') 2023-05-18 21:19:35,204 INFO [finetune.py:992] (1/2) Epoch 18, batch 4500, loss[loss=0.1616, simple_loss=0.2509, pruned_loss=0.03615, over 12202.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2523, pruned_loss=0.0364, over 2368550.42 frames. ], batch size: 35, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:19:47,743 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.034e+02 2.715e+02 3.190e+02 4.026e+02 2.285e+03, threshold=6.379e+02, percent-clipped=2.0 2023-05-18 21:19:52,069 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.9928, 2.4583, 3.5650, 2.9277, 3.3751, 3.1873, 2.4513, 3.4124], device='cuda:1'), covar=tensor([0.0186, 0.0426, 0.0197, 0.0322, 0.0207, 0.0228, 0.0492, 0.0179], device='cuda:1'), in_proj_covar=tensor([0.0193, 0.0219, 0.0207, 0.0201, 0.0234, 0.0180, 0.0213, 0.0204], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 21:19:56,198 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=312510.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:20:10,553 INFO [finetune.py:992] (1/2) Epoch 18, batch 4550, loss[loss=0.1849, simple_loss=0.2726, pruned_loss=0.04864, over 11257.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.2528, pruned_loss=0.03677, over 2372081.66 frames. ], batch size: 55, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:20:30,598 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=312558.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:20:41,747 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=312574.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:20:45,711 INFO [finetune.py:992] (1/2) Epoch 18, batch 4600, loss[loss=0.1425, simple_loss=0.2312, pruned_loss=0.02693, over 12089.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.2528, pruned_loss=0.03674, over 2371116.99 frames. ], batch size: 32, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:20:47,418 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.4854, 3.4999, 3.1359, 3.0443, 2.7877, 2.6141, 3.4802, 2.3648], device='cuda:1'), covar=tensor([0.0462, 0.0171, 0.0282, 0.0272, 0.0478, 0.0461, 0.0185, 0.0546], device='cuda:1'), in_proj_covar=tensor([0.0201, 0.0170, 0.0173, 0.0199, 0.0209, 0.0207, 0.0180, 0.0211], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 21:20:58,052 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.688e+02 3.108e+02 3.848e+02 5.963e+02, threshold=6.216e+02, percent-clipped=0.0 2023-05-18 21:21:20,502 INFO [finetune.py:992] (1/2) Epoch 18, batch 4650, loss[loss=0.1696, simple_loss=0.2625, pruned_loss=0.03839, over 12151.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2527, pruned_loss=0.03655, over 2376412.01 frames. ], batch size: 36, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:21:24,049 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=312635.0, num_to_drop=1, layers_to_drop={2} 2023-05-18 21:21:26,100 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=312638.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:21:54,583 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.55 vs. limit=2.0 2023-05-18 21:21:56,211 INFO [finetune.py:992] (1/2) Epoch 18, batch 4700, loss[loss=0.1678, simple_loss=0.2585, pruned_loss=0.03853, over 12291.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2521, pruned_loss=0.03633, over 2369508.87 frames. ], batch size: 37, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:21:58,457 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.9484, 4.5401, 4.6705, 4.7847, 4.6603, 4.9036, 4.7765, 2.5164], device='cuda:1'), covar=tensor([0.0103, 0.0094, 0.0106, 0.0072, 0.0057, 0.0102, 0.0107, 0.0957], device='cuda:1'), in_proj_covar=tensor([0.0073, 0.0082, 0.0086, 0.0077, 0.0063, 0.0097, 0.0085, 0.0102], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 21:22:08,468 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.813e+02 2.540e+02 2.971e+02 3.808e+02 6.706e+02, threshold=5.942e+02, percent-clipped=2.0 2023-05-18 21:22:09,360 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=312699.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:22:26,140 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.4054, 4.9043, 5.4138, 4.6842, 4.9983, 4.7859, 5.4189, 5.0301], device='cuda:1'), covar=tensor([0.0279, 0.0446, 0.0274, 0.0283, 0.0495, 0.0332, 0.0207, 0.0372], device='cuda:1'), in_proj_covar=tensor([0.0285, 0.0291, 0.0313, 0.0283, 0.0285, 0.0281, 0.0256, 0.0230], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 21:22:30,885 INFO [finetune.py:992] (1/2) Epoch 18, batch 4750, loss[loss=0.1531, simple_loss=0.2424, pruned_loss=0.03189, over 12191.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2523, pruned_loss=0.03648, over 2373905.81 frames. ], batch size: 29, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:22:31,098 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.4514, 2.3017, 3.2011, 4.2334, 2.2967, 4.3478, 4.4396, 4.4644], device='cuda:1'), covar=tensor([0.0139, 0.1509, 0.0519, 0.0199, 0.1474, 0.0253, 0.0174, 0.0137], device='cuda:1'), in_proj_covar=tensor([0.0128, 0.0208, 0.0187, 0.0126, 0.0193, 0.0186, 0.0185, 0.0129], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:1') 2023-05-18 21:23:05,268 INFO [finetune.py:992] (1/2) Epoch 18, batch 4800, loss[loss=0.1438, simple_loss=0.2289, pruned_loss=0.02939, over 12019.00 frames. ], tot_loss[loss=0.1635, simple_loss=0.2532, pruned_loss=0.03687, over 2366038.77 frames. ], batch size: 28, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:23:11,150 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.3505, 4.0453, 4.0987, 4.4765, 2.7430, 4.0461, 2.7014, 4.1200], device='cuda:1'), covar=tensor([0.1542, 0.0743, 0.0847, 0.0574, 0.1360, 0.0555, 0.1849, 0.1167], device='cuda:1'), in_proj_covar=tensor([0.0234, 0.0274, 0.0306, 0.0366, 0.0248, 0.0249, 0.0267, 0.0378], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-18 21:23:17,508 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.937e+02 2.574e+02 2.999e+02 3.689e+02 9.471e+02, threshold=5.998e+02, percent-clipped=4.0 2023-05-18 21:23:23,408 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.61 vs. limit=5.0 2023-05-18 21:23:40,817 INFO [finetune.py:992] (1/2) Epoch 18, batch 4850, loss[loss=0.1726, simple_loss=0.2643, pruned_loss=0.04047, over 12359.00 frames. ], tot_loss[loss=0.1638, simple_loss=0.2536, pruned_loss=0.03705, over 2364614.74 frames. ], batch size: 38, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:24:09,503 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.19 vs. limit=2.0 2023-05-18 21:24:15,422 INFO [finetune.py:992] (1/2) Epoch 18, batch 4900, loss[loss=0.1755, simple_loss=0.2662, pruned_loss=0.04244, over 12043.00 frames. ], tot_loss[loss=0.1635, simple_loss=0.2535, pruned_loss=0.03676, over 2362789.17 frames. ], batch size: 42, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:24:21,946 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-05-18 21:24:27,887 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 2.600e+02 3.047e+02 3.560e+02 6.137e+02, threshold=6.093e+02, percent-clipped=1.0 2023-05-18 21:24:50,415 INFO [finetune.py:992] (1/2) Epoch 18, batch 4950, loss[loss=0.1516, simple_loss=0.2377, pruned_loss=0.03278, over 12332.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2527, pruned_loss=0.03629, over 2373111.13 frames. ], batch size: 31, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:24:50,481 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=312930.0, num_to_drop=1, layers_to_drop={0} 2023-05-18 21:25:26,155 INFO [finetune.py:992] (1/2) Epoch 18, batch 5000, loss[loss=0.1602, simple_loss=0.2597, pruned_loss=0.03038, over 12197.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2529, pruned_loss=0.03625, over 2370160.72 frames. ], batch size: 35, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:25:35,982 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=312994.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:25:38,717 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.800e+02 2.662e+02 3.140e+02 3.645e+02 1.191e+03, threshold=6.280e+02, percent-clipped=2.0 2023-05-18 21:25:38,893 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=312998.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:26:01,186 INFO [finetune.py:992] (1/2) Epoch 18, batch 5050, loss[loss=0.1819, simple_loss=0.2747, pruned_loss=0.04452, over 12366.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.2536, pruned_loss=0.03659, over 2368905.51 frames. ], batch size: 38, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:26:08,712 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.40 vs. limit=2.0 2023-05-18 21:26:21,312 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=313059.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:26:35,688 INFO [finetune.py:992] (1/2) Epoch 18, batch 5100, loss[loss=0.1733, simple_loss=0.2639, pruned_loss=0.04139, over 12155.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.2532, pruned_loss=0.03675, over 2366657.55 frames. ], batch size: 34, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:26:48,180 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.002e+02 2.737e+02 3.179e+02 3.988e+02 8.125e+02, threshold=6.358e+02, percent-clipped=2.0 2023-05-18 21:27:12,273 INFO [finetune.py:992] (1/2) Epoch 18, batch 5150, loss[loss=0.1902, simple_loss=0.277, pruned_loss=0.05173, over 12147.00 frames. ], tot_loss[loss=0.1635, simple_loss=0.2535, pruned_loss=0.03676, over 2362934.22 frames. ], batch size: 39, lr: 3.24e-03, grad_scale: 8.0 2023-05-18 21:27:18,933 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.9528, 3.5682, 5.3450, 2.7021, 3.0309, 3.8554, 3.3715, 3.8100], device='cuda:1'), covar=tensor([0.0406, 0.1147, 0.0207, 0.1248, 0.1987, 0.1533, 0.1361, 0.1326], device='cuda:1'), in_proj_covar=tensor([0.0245, 0.0245, 0.0268, 0.0190, 0.0248, 0.0305, 0.0233, 0.0278], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 21:27:42,965 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.4452, 2.4147, 3.1555, 4.2458, 2.2934, 4.2931, 4.4662, 4.4701], device='cuda:1'), covar=tensor([0.0148, 0.1320, 0.0515, 0.0158, 0.1380, 0.0258, 0.0143, 0.0108], device='cuda:1'), in_proj_covar=tensor([0.0127, 0.0208, 0.0186, 0.0125, 0.0191, 0.0184, 0.0184, 0.0128], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 21:27:46,903 INFO [finetune.py:992] (1/2) Epoch 18, batch 5200, loss[loss=0.1603, simple_loss=0.2521, pruned_loss=0.03426, over 7864.00 frames. ], tot_loss[loss=0.1636, simple_loss=0.2537, pruned_loss=0.03672, over 2352278.87 frames. ], batch size: 98, lr: 3.24e-03, grad_scale: 16.0 2023-05-18 21:27:59,475 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.989e+02 2.725e+02 3.162e+02 3.690e+02 6.131e+02, threshold=6.324e+02, percent-clipped=0.0 2023-05-18 21:28:16,178 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.2403, 4.8109, 5.2370, 4.5668, 4.8506, 4.6820, 5.2381, 4.8550], device='cuda:1'), covar=tensor([0.0274, 0.0409, 0.0287, 0.0267, 0.0461, 0.0323, 0.0231, 0.0401], device='cuda:1'), in_proj_covar=tensor([0.0282, 0.0288, 0.0309, 0.0281, 0.0281, 0.0279, 0.0255, 0.0230], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 21:28:17,711 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.6116, 3.2765, 5.0694, 2.5286, 2.9870, 3.7416, 3.0329, 3.8306], device='cuda:1'), covar=tensor([0.0485, 0.1296, 0.0366, 0.1311, 0.1945, 0.1637, 0.1583, 0.1168], device='cuda:1'), in_proj_covar=tensor([0.0244, 0.0243, 0.0265, 0.0189, 0.0246, 0.0303, 0.0232, 0.0276], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 21:28:21,617 INFO [finetune.py:992] (1/2) Epoch 18, batch 5250, loss[loss=0.1475, simple_loss=0.2373, pruned_loss=0.0288, over 12092.00 frames. ], tot_loss[loss=0.1639, simple_loss=0.2541, pruned_loss=0.03687, over 2361299.93 frames. ], batch size: 32, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:28:21,794 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=313230.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:28:50,048 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=313269.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:28:51,395 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=313271.0, num_to_drop=1, layers_to_drop={0} 2023-05-18 21:28:56,129 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=313278.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:28:57,510 INFO [finetune.py:992] (1/2) Epoch 18, batch 5300, loss[loss=0.1497, simple_loss=0.2434, pruned_loss=0.02797, over 12367.00 frames. ], tot_loss[loss=0.1641, simple_loss=0.254, pruned_loss=0.03704, over 2369533.83 frames. ], batch size: 35, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:29:07,140 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=313294.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:29:09,760 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.971e+02 2.691e+02 3.047e+02 3.516e+02 5.591e+02, threshold=6.093e+02, percent-clipped=0.0 2023-05-18 21:29:31,942 INFO [finetune.py:992] (1/2) Epoch 18, batch 5350, loss[loss=0.145, simple_loss=0.2296, pruned_loss=0.03019, over 11419.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.2531, pruned_loss=0.03663, over 2378802.38 frames. ], batch size: 25, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:29:32,179 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=313330.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:29:33,464 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=313332.0, num_to_drop=1, layers_to_drop={2} 2023-05-18 21:29:40,099 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.27 vs. limit=2.0 2023-05-18 21:29:40,434 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=313342.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:29:46,167 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=313350.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:29:48,881 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=313354.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:29:52,611 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.3984, 4.8329, 3.0171, 2.6957, 4.1034, 2.9489, 4.0227, 3.5461], device='cuda:1'), covar=tensor([0.0756, 0.0450, 0.1176, 0.1560, 0.0340, 0.1155, 0.0630, 0.0711], device='cuda:1'), in_proj_covar=tensor([0.0194, 0.0266, 0.0182, 0.0205, 0.0146, 0.0188, 0.0206, 0.0179], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 21:29:54,917 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.16 vs. limit=2.0 2023-05-18 21:30:07,071 INFO [finetune.py:992] (1/2) Epoch 18, batch 5400, loss[loss=0.1572, simple_loss=0.2417, pruned_loss=0.03638, over 12011.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2529, pruned_loss=0.03651, over 2376379.91 frames. ], batch size: 31, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:30:07,204 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([6.2347, 6.1310, 5.9373, 5.4339, 5.2893, 6.1041, 5.7561, 5.5180], device='cuda:1'), covar=tensor([0.0642, 0.0969, 0.0662, 0.1734, 0.0740, 0.0675, 0.1462, 0.1023], device='cuda:1'), in_proj_covar=tensor([0.0655, 0.0580, 0.0536, 0.0657, 0.0440, 0.0754, 0.0811, 0.0589], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0004, 0.0002], device='cuda:1') 2023-05-18 21:30:07,322 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.6873, 3.7563, 3.3304, 3.2486, 2.9748, 2.8977, 3.7466, 2.5769], device='cuda:1'), covar=tensor([0.0414, 0.0137, 0.0227, 0.0248, 0.0472, 0.0438, 0.0154, 0.0505], device='cuda:1'), in_proj_covar=tensor([0.0202, 0.0171, 0.0175, 0.0201, 0.0210, 0.0209, 0.0183, 0.0213], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 21:30:20,021 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.921e+02 2.584e+02 2.894e+02 3.600e+02 7.897e+02, threshold=5.788e+02, percent-clipped=2.0 2023-05-18 21:30:30,006 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=313411.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:30:31,358 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.9243, 4.8142, 4.6974, 4.8179, 4.4222, 4.9650, 4.8096, 5.1772], device='cuda:1'), covar=tensor([0.0352, 0.0185, 0.0259, 0.0419, 0.0884, 0.0383, 0.0246, 0.0185], device='cuda:1'), in_proj_covar=tensor([0.0205, 0.0207, 0.0200, 0.0257, 0.0250, 0.0231, 0.0186, 0.0240], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:1') 2023-05-18 21:30:42,867 INFO [finetune.py:992] (1/2) Epoch 18, batch 5450, loss[loss=0.1502, simple_loss=0.2364, pruned_loss=0.03202, over 11360.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2527, pruned_loss=0.03634, over 2379537.99 frames. ], batch size: 25, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:31:17,027 INFO [finetune.py:992] (1/2) Epoch 18, batch 5500, loss[loss=0.1948, simple_loss=0.2899, pruned_loss=0.04983, over 10393.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.2527, pruned_loss=0.03647, over 2369070.24 frames. ], batch size: 68, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:31:20,218 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.18 vs. limit=2.0 2023-05-18 21:31:23,475 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.4187, 4.9257, 4.2159, 5.1678, 4.6253, 3.2472, 4.4983, 3.1663], device='cuda:1'), covar=tensor([0.0840, 0.0723, 0.1555, 0.0448, 0.1225, 0.1596, 0.0983, 0.3439], device='cuda:1'), in_proj_covar=tensor([0.0316, 0.0386, 0.0368, 0.0343, 0.0379, 0.0282, 0.0355, 0.0374], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 21:31:29,306 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.748e+02 2.547e+02 3.061e+02 3.924e+02 7.161e+02, threshold=6.121e+02, percent-clipped=6.0 2023-05-18 21:31:43,478 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.79 vs. limit=2.0 2023-05-18 21:31:51,524 INFO [finetune.py:992] (1/2) Epoch 18, batch 5550, loss[loss=0.169, simple_loss=0.2545, pruned_loss=0.04177, over 11646.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2522, pruned_loss=0.03637, over 2372959.05 frames. ], batch size: 48, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:32:16,091 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.3593, 2.6671, 3.8120, 3.2227, 3.7137, 3.3744, 2.7819, 3.8114], device='cuda:1'), covar=tensor([0.0133, 0.0440, 0.0202, 0.0262, 0.0156, 0.0207, 0.0444, 0.0133], device='cuda:1'), in_proj_covar=tensor([0.0194, 0.0219, 0.0207, 0.0202, 0.0235, 0.0181, 0.0212, 0.0204], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 21:32:28,008 INFO [finetune.py:992] (1/2) Epoch 18, batch 5600, loss[loss=0.174, simple_loss=0.2593, pruned_loss=0.04435, over 12149.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.2516, pruned_loss=0.03601, over 2369811.71 frames. ], batch size: 34, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:32:34,479 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.0994, 4.6701, 2.7562, 2.3025, 3.9609, 2.5132, 3.8793, 3.1249], device='cuda:1'), covar=tensor([0.0753, 0.0462, 0.1125, 0.1752, 0.0278, 0.1319, 0.0477, 0.0841], device='cuda:1'), in_proj_covar=tensor([0.0194, 0.0266, 0.0182, 0.0206, 0.0146, 0.0189, 0.0206, 0.0179], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 21:32:40,618 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.767e+02 2.476e+02 2.852e+02 3.313e+02 5.624e+02, threshold=5.704e+02, percent-clipped=0.0 2023-05-18 21:32:57,160 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=3.30 vs. limit=5.0 2023-05-18 21:32:59,578 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=313625.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:33:00,949 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=313627.0, num_to_drop=1, layers_to_drop={2} 2023-05-18 21:33:02,987 INFO [finetune.py:992] (1/2) Epoch 18, batch 5650, loss[loss=0.1376, simple_loss=0.2258, pruned_loss=0.0247, over 12009.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2528, pruned_loss=0.03649, over 2370763.92 frames. ], batch size: 28, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:33:19,981 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=313654.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:33:38,046 INFO [finetune.py:992] (1/2) Epoch 18, batch 5700, loss[loss=0.1717, simple_loss=0.2647, pruned_loss=0.03929, over 12373.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2527, pruned_loss=0.03624, over 2376153.60 frames. ], batch size: 38, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:33:51,687 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.770e+02 2.438e+02 2.940e+02 3.390e+02 5.441e+02, threshold=5.879e+02, percent-clipped=0.0 2023-05-18 21:33:54,471 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=313702.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:33:57,334 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=313706.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:34:13,532 INFO [finetune.py:992] (1/2) Epoch 18, batch 5750, loss[loss=0.1405, simple_loss=0.2306, pruned_loss=0.02517, over 12125.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2527, pruned_loss=0.03637, over 2371465.14 frames. ], batch size: 30, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:34:23,594 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=3.41 vs. limit=5.0 2023-05-18 21:34:36,742 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([6.0984, 6.0742, 5.8439, 5.3355, 5.2414, 6.0055, 5.5665, 5.4220], device='cuda:1'), covar=tensor([0.0834, 0.1022, 0.0707, 0.1825, 0.0785, 0.0755, 0.1889, 0.1038], device='cuda:1'), in_proj_covar=tensor([0.0658, 0.0582, 0.0537, 0.0658, 0.0440, 0.0756, 0.0815, 0.0589], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0004, 0.0002], device='cuda:1') 2023-05-18 21:34:47,672 INFO [finetune.py:992] (1/2) Epoch 18, batch 5800, loss[loss=0.1888, simple_loss=0.2773, pruned_loss=0.0501, over 12122.00 frames. ], tot_loss[loss=0.1635, simple_loss=0.2534, pruned_loss=0.03682, over 2374535.58 frames. ], batch size: 39, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:34:57,578 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=313794.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:35:00,095 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.106e+02 2.651e+02 3.293e+02 3.863e+02 6.789e+02, threshold=6.585e+02, percent-clipped=3.0 2023-05-18 21:35:18,508 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=313824.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:35:22,605 INFO [finetune.py:992] (1/2) Epoch 18, batch 5850, loss[loss=0.1321, simple_loss=0.2183, pruned_loss=0.02293, over 12127.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2525, pruned_loss=0.03662, over 2383811.24 frames. ], batch size: 30, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:35:41,062 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=313855.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:35:47,368 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=313864.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:35:58,131 INFO [finetune.py:992] (1/2) Epoch 18, batch 5900, loss[loss=0.196, simple_loss=0.2815, pruned_loss=0.05523, over 11846.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2523, pruned_loss=0.03655, over 2382495.50 frames. ], batch size: 44, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:35:58,365 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.3092, 3.4607, 3.1706, 3.1109, 2.7332, 2.6486, 3.5290, 2.4059], device='cuda:1'), covar=tensor([0.0491, 0.0172, 0.0219, 0.0249, 0.0414, 0.0396, 0.0154, 0.0495], device='cuda:1'), in_proj_covar=tensor([0.0199, 0.0170, 0.0173, 0.0198, 0.0208, 0.0206, 0.0182, 0.0210], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 21:35:59,701 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.2459, 4.5860, 2.9574, 2.7364, 3.9806, 2.7479, 3.9843, 3.2245], device='cuda:1'), covar=tensor([0.0824, 0.0655, 0.1218, 0.1514, 0.0339, 0.1397, 0.0536, 0.0917], device='cuda:1'), in_proj_covar=tensor([0.0194, 0.0267, 0.0181, 0.0206, 0.0147, 0.0189, 0.0206, 0.0180], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 21:36:01,731 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=313885.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:36:10,481 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.656e+02 3.061e+02 3.514e+02 6.124e+02, threshold=6.123e+02, percent-clipped=0.0 2023-05-18 21:36:29,371 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=313925.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:36:29,404 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=313925.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:36:30,721 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=313927.0, num_to_drop=1, layers_to_drop={2} 2023-05-18 21:36:32,514 INFO [finetune.py:992] (1/2) Epoch 18, batch 5950, loss[loss=0.1437, simple_loss=0.23, pruned_loss=0.02873, over 12334.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2522, pruned_loss=0.0366, over 2380420.53 frames. ], batch size: 30, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:37:02,055 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=313973.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:37:03,485 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=313975.0, num_to_drop=1, layers_to_drop={0} 2023-05-18 21:37:04,564 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.66 vs. limit=5.0 2023-05-18 21:37:06,865 INFO [finetune.py:992] (1/2) Epoch 18, batch 6000, loss[loss=0.202, simple_loss=0.2891, pruned_loss=0.05746, over 12067.00 frames. ], tot_loss[loss=0.1638, simple_loss=0.253, pruned_loss=0.03723, over 2374695.50 frames. ], batch size: 42, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:37:06,865 INFO [finetune.py:1017] (1/2) Computing validation loss 2023-05-18 21:37:15,068 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.9718, 2.2063, 3.2991, 2.8575, 3.1280, 3.1607, 2.2183, 3.3589], device='cuda:1'), covar=tensor([0.0163, 0.0406, 0.0087, 0.0231, 0.0153, 0.0133, 0.0424, 0.0106], device='cuda:1'), in_proj_covar=tensor([0.0195, 0.0220, 0.0207, 0.0203, 0.0237, 0.0182, 0.0213, 0.0205], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 21:37:15,349 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.1945, 2.4309, 3.7579, 3.0730, 3.5178, 3.3294, 2.5620, 3.6780], device='cuda:1'), covar=tensor([0.0145, 0.0452, 0.0091, 0.0254, 0.0137, 0.0155, 0.0386, 0.0113], device='cuda:1'), in_proj_covar=tensor([0.0195, 0.0220, 0.0207, 0.0203, 0.0237, 0.0182, 0.0213, 0.0205], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 21:37:24,913 INFO [finetune.py:1026] (1/2) Epoch 18, validation: loss=0.3118, simple_loss=0.3886, pruned_loss=0.1174, over 1020973.00 frames. 2023-05-18 21:37:24,913 INFO [finetune.py:1027] (1/2) Maximum memory allocated so far is 12300MB 2023-05-18 21:37:33,046 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=3.29 vs. limit=5.0 2023-05-18 21:37:37,458 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.921e+02 2.807e+02 3.210e+02 3.782e+02 1.181e+03, threshold=6.421e+02, percent-clipped=7.0 2023-05-18 21:37:46,378 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=314006.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:37:53,801 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.38 vs. limit=2.0 2023-05-18 21:38:02,868 INFO [finetune.py:992] (1/2) Epoch 18, batch 6050, loss[loss=0.1521, simple_loss=0.2346, pruned_loss=0.03476, over 12093.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.2528, pruned_loss=0.037, over 2378776.83 frames. ], batch size: 32, lr: 3.23e-03, grad_scale: 16.0 2023-05-18 21:38:19,476 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=314054.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:38:37,384 INFO [finetune.py:992] (1/2) Epoch 18, batch 6100, loss[loss=0.1555, simple_loss=0.2491, pruned_loss=0.0309, over 12098.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.2526, pruned_loss=0.0369, over 2372300.49 frames. ], batch size: 32, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:38:51,132 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.400e+02 2.937e+02 3.569e+02 7.911e+02, threshold=5.875e+02, percent-clipped=1.0 2023-05-18 21:39:13,065 INFO [finetune.py:992] (1/2) Epoch 18, batch 6150, loss[loss=0.1804, simple_loss=0.2742, pruned_loss=0.04336, over 12151.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.251, pruned_loss=0.03608, over 2381569.06 frames. ], batch size: 39, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:39:27,090 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=314150.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:39:47,818 INFO [finetune.py:992] (1/2) Epoch 18, batch 6200, loss[loss=0.179, simple_loss=0.2729, pruned_loss=0.04252, over 11796.00 frames. ], tot_loss[loss=0.161, simple_loss=0.2506, pruned_loss=0.03572, over 2383853.50 frames. ], batch size: 44, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:39:47,901 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=314180.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:40:01,102 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.588e+02 3.111e+02 3.927e+02 7.134e+02, threshold=6.221e+02, percent-clipped=2.0 2023-05-18 21:40:02,242 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=314200.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:40:15,634 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=314220.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:40:22,478 INFO [finetune.py:992] (1/2) Epoch 18, batch 6250, loss[loss=0.1592, simple_loss=0.2516, pruned_loss=0.03344, over 10369.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2521, pruned_loss=0.0365, over 2369786.34 frames. ], batch size: 68, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:40:25,449 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([6.1131, 6.0848, 5.8860, 5.3053, 5.3322, 6.0158, 5.6266, 5.4468], device='cuda:1'), covar=tensor([0.0619, 0.0828, 0.0580, 0.1591, 0.0686, 0.0689, 0.1414, 0.0973], device='cuda:1'), in_proj_covar=tensor([0.0659, 0.0585, 0.0538, 0.0658, 0.0442, 0.0757, 0.0815, 0.0590], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0004, 0.0002], device='cuda:1') 2023-05-18 21:40:45,207 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=314261.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:40:58,297 INFO [finetune.py:992] (1/2) Epoch 18, batch 6300, loss[loss=0.1715, simple_loss=0.2684, pruned_loss=0.03726, over 12102.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.2526, pruned_loss=0.03655, over 2371026.55 frames. ], batch size: 38, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:41:11,487 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.730e+02 2.601e+02 2.924e+02 3.480e+02 6.548e+02, threshold=5.848e+02, percent-clipped=1.0 2023-05-18 21:41:17,905 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.3841, 4.7511, 3.0087, 2.5055, 4.1902, 2.3380, 4.0984, 3.0381], device='cuda:1'), covar=tensor([0.0679, 0.0614, 0.1198, 0.2006, 0.0301, 0.1863, 0.0523, 0.1133], device='cuda:1'), in_proj_covar=tensor([0.0193, 0.0267, 0.0181, 0.0206, 0.0146, 0.0188, 0.0206, 0.0180], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 21:41:32,872 INFO [finetune.py:992] (1/2) Epoch 18, batch 6350, loss[loss=0.2363, simple_loss=0.3127, pruned_loss=0.07997, over 8190.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2524, pruned_loss=0.03634, over 2374482.25 frames. ], batch size: 98, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:42:07,286 INFO [finetune.py:992] (1/2) Epoch 18, batch 6400, loss[loss=0.1376, simple_loss=0.233, pruned_loss=0.02112, over 12254.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.2531, pruned_loss=0.03653, over 2370714.85 frames. ], batch size: 32, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:42:10,428 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=2.74 vs. limit=5.0 2023-05-18 21:42:11,645 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.5341, 3.4616, 3.2694, 3.2085, 2.8350, 2.6639, 3.5685, 2.3891], device='cuda:1'), covar=tensor([0.0411, 0.0167, 0.0174, 0.0219, 0.0420, 0.0425, 0.0150, 0.0511], device='cuda:1'), in_proj_covar=tensor([0.0198, 0.0169, 0.0172, 0.0198, 0.0208, 0.0205, 0.0179, 0.0210], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 21:42:17,166 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.34 vs. limit=2.0 2023-05-18 21:42:21,462 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.678e+02 2.738e+02 3.152e+02 3.622e+02 1.272e+03, threshold=6.305e+02, percent-clipped=1.0 2023-05-18 21:42:43,363 INFO [finetune.py:992] (1/2) Epoch 18, batch 6450, loss[loss=0.1796, simple_loss=0.2706, pruned_loss=0.04433, over 11835.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.2536, pruned_loss=0.03661, over 2362981.85 frames. ], batch size: 44, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:42:57,314 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=314450.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:43:18,138 INFO [finetune.py:992] (1/2) Epoch 18, batch 6500, loss[loss=0.1634, simple_loss=0.2546, pruned_loss=0.03605, over 12345.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.2531, pruned_loss=0.03651, over 2365839.55 frames. ], batch size: 36, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:43:18,320 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=314480.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:43:30,913 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=314498.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:43:31,505 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.928e+02 2.650e+02 3.059e+02 3.575e+02 7.984e+02, threshold=6.118e+02, percent-clipped=1.0 2023-05-18 21:43:35,832 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.1068, 3.7484, 3.9473, 4.3604, 2.7891, 3.7308, 2.6107, 3.8092], device='cuda:1'), covar=tensor([0.1809, 0.0991, 0.1122, 0.0717, 0.1470, 0.0810, 0.2063, 0.1207], device='cuda:1'), in_proj_covar=tensor([0.0235, 0.0277, 0.0308, 0.0369, 0.0251, 0.0252, 0.0269, 0.0381], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-18 21:43:45,967 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=314520.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:43:52,065 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=314528.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:43:53,435 INFO [finetune.py:992] (1/2) Epoch 18, batch 6550, loss[loss=0.1501, simple_loss=0.2325, pruned_loss=0.03385, over 12127.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2527, pruned_loss=0.0362, over 2369070.77 frames. ], batch size: 30, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:43:57,866 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=314536.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:44:11,925 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=314556.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:44:20,023 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=314568.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:44:28,279 INFO [finetune.py:992] (1/2) Epoch 18, batch 6600, loss[loss=0.1461, simple_loss=0.2287, pruned_loss=0.03177, over 12336.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2526, pruned_loss=0.03614, over 2379703.45 frames. ], batch size: 30, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:44:40,292 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=314597.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:44:41,493 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.516e+02 3.007e+02 3.648e+02 7.193e+02, threshold=6.013e+02, percent-clipped=1.0 2023-05-18 21:44:57,478 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.88 vs. limit=2.0 2023-05-18 21:45:03,480 INFO [finetune.py:992] (1/2) Epoch 18, batch 6650, loss[loss=0.1637, simple_loss=0.2648, pruned_loss=0.03129, over 12347.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2527, pruned_loss=0.03592, over 2384257.45 frames. ], batch size: 36, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:45:04,543 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.66 vs. limit=2.0 2023-05-18 21:45:20,982 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.5601, 2.8933, 3.9426, 2.2008, 2.5834, 3.1016, 2.8802, 3.2146], device='cuda:1'), covar=tensor([0.0605, 0.1332, 0.0351, 0.1375, 0.1875, 0.1702, 0.1428, 0.1060], device='cuda:1'), in_proj_covar=tensor([0.0246, 0.0247, 0.0268, 0.0190, 0.0248, 0.0306, 0.0234, 0.0278], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 21:45:38,812 INFO [finetune.py:992] (1/2) Epoch 18, batch 6700, loss[loss=0.1757, simple_loss=0.2699, pruned_loss=0.04075, over 10543.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.2531, pruned_loss=0.03625, over 2376872.75 frames. ], batch size: 68, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:45:52,591 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.778e+02 2.657e+02 3.017e+02 3.709e+02 7.789e+02, threshold=6.033e+02, percent-clipped=3.0 2023-05-18 21:46:03,790 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.4811, 2.5555, 3.0889, 4.3080, 2.2010, 4.4170, 4.4994, 4.5785], device='cuda:1'), covar=tensor([0.0178, 0.1368, 0.0585, 0.0184, 0.1508, 0.0249, 0.0172, 0.0115], device='cuda:1'), in_proj_covar=tensor([0.0128, 0.0207, 0.0186, 0.0125, 0.0191, 0.0184, 0.0184, 0.0127], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 21:46:14,030 INFO [finetune.py:992] (1/2) Epoch 18, batch 6750, loss[loss=0.1769, simple_loss=0.2736, pruned_loss=0.04007, over 12154.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2525, pruned_loss=0.03612, over 2384813.47 frames. ], batch size: 36, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:46:34,898 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.2705, 6.1736, 5.7646, 5.7290, 6.2552, 5.5077, 5.6647, 5.7662], device='cuda:1'), covar=tensor([0.1529, 0.0877, 0.1259, 0.1737, 0.0839, 0.2201, 0.1959, 0.1158], device='cuda:1'), in_proj_covar=tensor([0.0380, 0.0531, 0.0424, 0.0468, 0.0487, 0.0466, 0.0426, 0.0409], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 21:46:49,122 INFO [finetune.py:992] (1/2) Epoch 18, batch 6800, loss[loss=0.1441, simple_loss=0.2368, pruned_loss=0.02574, over 12190.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2524, pruned_loss=0.03619, over 2379484.75 frames. ], batch size: 31, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:47:02,249 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.884e+02 2.521e+02 2.899e+02 3.298e+02 8.690e+02, threshold=5.798e+02, percent-clipped=2.0 2023-05-18 21:47:25,166 INFO [finetune.py:992] (1/2) Epoch 18, batch 6850, loss[loss=0.2011, simple_loss=0.2834, pruned_loss=0.05936, over 8013.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.252, pruned_loss=0.03587, over 2380405.65 frames. ], batch size: 98, lr: 3.23e-03, grad_scale: 8.0 2023-05-18 21:47:41,927 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.7222, 2.6604, 4.6480, 4.8524, 2.9634, 2.5148, 2.8956, 2.0628], device='cuda:1'), covar=tensor([0.1644, 0.3320, 0.0434, 0.0357, 0.1154, 0.2649, 0.2999, 0.4390], device='cuda:1'), in_proj_covar=tensor([0.0313, 0.0400, 0.0285, 0.0312, 0.0284, 0.0326, 0.0410, 0.0387], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-18 21:47:43,863 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=314856.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:48:00,054 INFO [finetune.py:992] (1/2) Epoch 18, batch 6900, loss[loss=0.1762, simple_loss=0.2712, pruned_loss=0.0406, over 12150.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.2529, pruned_loss=0.03636, over 2378645.43 frames. ], batch size: 39, lr: 3.22e-03, grad_scale: 4.0 2023-05-18 21:48:00,523 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.21 vs. limit=2.0 2023-05-18 21:48:08,484 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=314892.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:48:13,840 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.056e+02 2.648e+02 3.153e+02 3.942e+02 5.673e+02, threshold=6.307e+02, percent-clipped=0.0 2023-05-18 21:48:16,552 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=314904.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:48:18,824 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.3334, 5.1664, 5.3305, 5.3342, 4.9921, 4.9920, 4.7441, 5.2298], device='cuda:1'), covar=tensor([0.0702, 0.0580, 0.0807, 0.0524, 0.1699, 0.1288, 0.0573, 0.1138], device='cuda:1'), in_proj_covar=tensor([0.0561, 0.0738, 0.0644, 0.0652, 0.0874, 0.0776, 0.0583, 0.0502], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0005, 0.0003, 0.0003], device='cuda:1') 2023-05-18 21:48:20,866 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.8379, 5.6099, 5.2323, 5.1399, 5.6842, 4.9155, 5.0380, 5.0953], device='cuda:1'), covar=tensor([0.1663, 0.0999, 0.1222, 0.1882, 0.1008, 0.2622, 0.2268, 0.1384], device='cuda:1'), in_proj_covar=tensor([0.0379, 0.0530, 0.0423, 0.0468, 0.0487, 0.0467, 0.0427, 0.0408], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 21:48:28,596 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.9595, 2.1633, 3.1233, 3.8435, 2.0298, 3.9675, 3.9000, 4.1136], device='cuda:1'), covar=tensor([0.0156, 0.1448, 0.0483, 0.0161, 0.1495, 0.0241, 0.0211, 0.0102], device='cuda:1'), in_proj_covar=tensor([0.0127, 0.0207, 0.0186, 0.0125, 0.0191, 0.0185, 0.0184, 0.0127], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 21:48:34,560 INFO [finetune.py:992] (1/2) Epoch 18, batch 6950, loss[loss=0.1552, simple_loss=0.241, pruned_loss=0.03472, over 12251.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.2535, pruned_loss=0.03632, over 2377697.93 frames. ], batch size: 32, lr: 3.22e-03, grad_scale: 4.0 2023-05-18 21:48:37,522 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=314934.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:48:56,337 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=314960.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:49:01,311 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.3278, 4.7951, 3.0077, 2.8213, 4.0607, 2.5215, 3.9424, 3.2218], device='cuda:1'), covar=tensor([0.0749, 0.0445, 0.1051, 0.1408, 0.0330, 0.1491, 0.0538, 0.0874], device='cuda:1'), in_proj_covar=tensor([0.0191, 0.0263, 0.0180, 0.0203, 0.0146, 0.0186, 0.0204, 0.0177], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 21:49:10,230 INFO [finetune.py:992] (1/2) Epoch 18, batch 7000, loss[loss=0.1385, simple_loss=0.2153, pruned_loss=0.03086, over 11801.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.2532, pruned_loss=0.03623, over 2383956.52 frames. ], batch size: 26, lr: 3.22e-03, grad_scale: 4.0 2023-05-18 21:49:21,137 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=314995.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:49:24,529 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.910e+02 2.740e+02 3.038e+02 3.572e+02 5.545e+02, threshold=6.076e+02, percent-clipped=0.0 2023-05-18 21:49:31,870 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.8910, 5.9064, 5.6502, 5.2384, 5.1090, 5.8151, 5.4022, 5.2099], device='cuda:1'), covar=tensor([0.0835, 0.0871, 0.0709, 0.1478, 0.0854, 0.0704, 0.1687, 0.1189], device='cuda:1'), in_proj_covar=tensor([0.0662, 0.0586, 0.0538, 0.0661, 0.0441, 0.0759, 0.0817, 0.0590], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0004, 0.0002], device='cuda:1') 2023-05-18 21:49:32,139 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.17 vs. limit=2.0 2023-05-18 21:49:34,802 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.2875, 4.7242, 3.9414, 4.9491, 4.3149, 2.5990, 4.0075, 2.7790], device='cuda:1'), covar=tensor([0.0770, 0.0633, 0.1638, 0.0497, 0.1346, 0.1984, 0.1269, 0.3545], device='cuda:1'), in_proj_covar=tensor([0.0318, 0.0387, 0.0370, 0.0346, 0.0380, 0.0282, 0.0356, 0.0374], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 21:49:39,570 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=315021.0, num_to_drop=1, layers_to_drop={3} 2023-05-18 21:49:45,683 INFO [finetune.py:992] (1/2) Epoch 18, batch 7050, loss[loss=0.1668, simple_loss=0.2616, pruned_loss=0.03601, over 12285.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.2537, pruned_loss=0.03658, over 2385972.72 frames. ], batch size: 33, lr: 3.22e-03, grad_scale: 4.0 2023-05-18 21:49:45,812 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=315030.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:49:49,387 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.5205, 2.5055, 3.3256, 4.3559, 2.3799, 4.4055, 4.4410, 4.5437], device='cuda:1'), covar=tensor([0.0130, 0.1404, 0.0469, 0.0152, 0.1341, 0.0233, 0.0182, 0.0102], device='cuda:1'), in_proj_covar=tensor([0.0128, 0.0207, 0.0185, 0.0125, 0.0191, 0.0184, 0.0184, 0.0128], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 21:50:20,629 INFO [finetune.py:992] (1/2) Epoch 18, batch 7100, loss[loss=0.1711, simple_loss=0.2602, pruned_loss=0.04104, over 12368.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.2536, pruned_loss=0.03643, over 2393429.24 frames. ], batch size: 36, lr: 3.22e-03, grad_scale: 4.0 2023-05-18 21:50:28,587 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=315091.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:50:34,545 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.881e+02 2.567e+02 2.898e+02 3.446e+02 5.668e+02, threshold=5.796e+02, percent-clipped=0.0 2023-05-18 21:50:56,763 INFO [finetune.py:992] (1/2) Epoch 18, batch 7150, loss[loss=0.1499, simple_loss=0.2273, pruned_loss=0.03626, over 12270.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.2533, pruned_loss=0.03648, over 2387996.95 frames. ], batch size: 28, lr: 3.22e-03, grad_scale: 4.0 2023-05-18 21:51:13,100 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.5764, 5.1082, 5.5659, 4.8526, 5.2060, 5.0191, 5.5886, 5.0941], device='cuda:1'), covar=tensor([0.0271, 0.0428, 0.0268, 0.0277, 0.0380, 0.0293, 0.0202, 0.0367], device='cuda:1'), in_proj_covar=tensor([0.0284, 0.0289, 0.0312, 0.0282, 0.0282, 0.0281, 0.0255, 0.0230], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 21:51:25,218 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.22 vs. limit=2.0 2023-05-18 21:51:27,084 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=315173.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:51:31,669 INFO [finetune.py:992] (1/2) Epoch 18, batch 7200, loss[loss=0.1535, simple_loss=0.2402, pruned_loss=0.03338, over 12292.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2523, pruned_loss=0.03622, over 2382495.85 frames. ], batch size: 33, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 21:51:39,935 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=315192.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:51:45,343 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.719e+02 2.526e+02 2.907e+02 3.593e+02 5.562e+02, threshold=5.814e+02, percent-clipped=0.0 2023-05-18 21:52:01,856 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.4637, 5.2712, 5.4446, 5.4559, 5.0617, 5.1098, 4.9264, 5.3268], device='cuda:1'), covar=tensor([0.0738, 0.0689, 0.0837, 0.0622, 0.1955, 0.1435, 0.0530, 0.1212], device='cuda:1'), in_proj_covar=tensor([0.0563, 0.0743, 0.0650, 0.0656, 0.0885, 0.0781, 0.0586, 0.0503], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0003, 0.0003], device='cuda:1') 2023-05-18 21:52:06,692 INFO [finetune.py:992] (1/2) Epoch 18, batch 7250, loss[loss=0.1487, simple_loss=0.2371, pruned_loss=0.03009, over 12170.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2523, pruned_loss=0.03601, over 2391338.99 frames. ], batch size: 31, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 21:52:09,570 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=315234.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:52:13,720 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=315240.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:52:42,443 INFO [finetune.py:992] (1/2) Epoch 18, batch 7300, loss[loss=0.1368, simple_loss=0.2249, pruned_loss=0.02435, over 12343.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.2536, pruned_loss=0.03634, over 2379900.13 frames. ], batch size: 31, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 21:52:49,450 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=315290.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:52:56,198 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.996e+02 2.573e+02 3.018e+02 3.634e+02 9.167e+02, threshold=6.037e+02, percent-clipped=2.0 2023-05-18 21:53:06,288 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0 2023-05-18 21:53:07,223 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=315316.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 21:53:12,858 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([6.1978, 6.1389, 5.9378, 5.4927, 5.3692, 6.1067, 5.7159, 5.4971], device='cuda:1'), covar=tensor([0.0741, 0.0963, 0.0681, 0.1589, 0.0645, 0.0674, 0.1550, 0.1134], device='cuda:1'), in_proj_covar=tensor([0.0672, 0.0593, 0.0546, 0.0671, 0.0447, 0.0770, 0.0828, 0.0599], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0004, 0.0004, 0.0003], device='cuda:1') 2023-05-18 21:53:12,909 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=315324.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:53:16,971 INFO [finetune.py:992] (1/2) Epoch 18, batch 7350, loss[loss=0.139, simple_loss=0.233, pruned_loss=0.02251, over 12022.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2533, pruned_loss=0.03642, over 2373090.02 frames. ], batch size: 31, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 21:53:18,088 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.85 vs. limit=2.0 2023-05-18 21:53:43,073 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.21 vs. limit=2.0 2023-05-18 21:53:47,531 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=315374.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:53:51,523 INFO [finetune.py:992] (1/2) Epoch 18, batch 7400, loss[loss=0.1702, simple_loss=0.2678, pruned_loss=0.03629, over 12195.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.2534, pruned_loss=0.03666, over 2373199.82 frames. ], batch size: 35, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 21:53:53,091 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=315382.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:53:55,187 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=315385.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:53:55,725 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=315386.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:53:58,783 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.1384, 3.8284, 4.0550, 4.3566, 3.0621, 3.7907, 2.7487, 4.0463], device='cuda:1'), covar=tensor([0.1609, 0.0836, 0.0917, 0.0707, 0.1178, 0.0679, 0.1679, 0.1012], device='cuda:1'), in_proj_covar=tensor([0.0229, 0.0270, 0.0299, 0.0359, 0.0245, 0.0244, 0.0261, 0.0371], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-18 21:54:05,940 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.760e+02 2.520e+02 3.033e+02 3.513e+02 9.241e+02, threshold=6.066e+02, percent-clipped=3.0 2023-05-18 21:54:27,604 INFO [finetune.py:992] (1/2) Epoch 18, batch 7450, loss[loss=0.1442, simple_loss=0.2351, pruned_loss=0.0266, over 12096.00 frames. ], tot_loss[loss=0.1636, simple_loss=0.2534, pruned_loss=0.03684, over 2360725.86 frames. ], batch size: 32, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 21:54:31,188 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.9207, 5.7671, 5.3430, 5.2027, 5.8406, 4.9672, 5.2119, 5.1821], device='cuda:1'), covar=tensor([0.1753, 0.0948, 0.1243, 0.2172, 0.0951, 0.2797, 0.1966, 0.1513], device='cuda:1'), in_proj_covar=tensor([0.0375, 0.0526, 0.0422, 0.0465, 0.0482, 0.0464, 0.0425, 0.0407], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 21:54:31,329 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=315435.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:54:36,918 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=315443.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:54:59,067 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.4196, 5.2240, 5.3627, 5.3891, 5.0053, 5.0766, 4.8266, 5.2495], device='cuda:1'), covar=tensor([0.0693, 0.0664, 0.0886, 0.0573, 0.2151, 0.1368, 0.0552, 0.1314], device='cuda:1'), in_proj_covar=tensor([0.0564, 0.0747, 0.0652, 0.0660, 0.0889, 0.0785, 0.0590, 0.0508], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0003, 0.0003], device='cuda:1') 2023-05-18 21:55:01,193 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.3231, 2.5135, 3.6485, 4.3898, 3.8452, 4.3687, 3.7641, 3.0928], device='cuda:1'), covar=tensor([0.0059, 0.0433, 0.0153, 0.0048, 0.0130, 0.0088, 0.0134, 0.0428], device='cuda:1'), in_proj_covar=tensor([0.0095, 0.0128, 0.0110, 0.0085, 0.0110, 0.0122, 0.0107, 0.0146], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 21:55:02,372 INFO [finetune.py:992] (1/2) Epoch 18, batch 7500, loss[loss=0.1494, simple_loss=0.2252, pruned_loss=0.03681, over 12294.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.2529, pruned_loss=0.03674, over 2366887.07 frames. ], batch size: 28, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 21:55:08,980 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.37 vs. limit=2.0 2023-05-18 21:55:12,367 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.89 vs. limit=2.0 2023-05-18 21:55:16,093 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.622e+02 2.675e+02 3.112e+02 3.844e+02 9.516e+02, threshold=6.223e+02, percent-clipped=2.0 2023-05-18 21:55:20,107 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2023-05-18 21:55:36,155 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=315529.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:55:36,790 INFO [finetune.py:992] (1/2) Epoch 18, batch 7550, loss[loss=0.128, simple_loss=0.2176, pruned_loss=0.01921, over 12002.00 frames. ], tot_loss[loss=0.1638, simple_loss=0.2538, pruned_loss=0.03692, over 2355909.18 frames. ], batch size: 28, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 21:56:02,644 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.67 vs. limit=5.0 2023-05-18 21:56:12,510 INFO [finetune.py:992] (1/2) Epoch 18, batch 7600, loss[loss=0.1397, simple_loss=0.2225, pruned_loss=0.02849, over 12023.00 frames. ], tot_loss[loss=0.1642, simple_loss=0.2541, pruned_loss=0.03713, over 2362792.69 frames. ], batch size: 28, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 21:56:19,733 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=315590.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:56:26,712 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.697e+02 3.069e+02 3.511e+02 7.016e+02, threshold=6.139e+02, percent-clipped=2.0 2023-05-18 21:56:28,588 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.2832, 4.6708, 4.0103, 4.8701, 4.5217, 2.9073, 4.2414, 2.9387], device='cuda:1'), covar=tensor([0.0852, 0.0766, 0.1689, 0.0531, 0.1127, 0.1835, 0.1127, 0.3571], device='cuda:1'), in_proj_covar=tensor([0.0318, 0.0388, 0.0370, 0.0347, 0.0382, 0.0281, 0.0357, 0.0375], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 21:56:38,041 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=315616.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:56:39,017 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.34 vs. limit=2.0 2023-05-18 21:56:44,307 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.4464, 4.7952, 4.2327, 5.0494, 4.6002, 3.0371, 4.1723, 3.1142], device='cuda:1'), covar=tensor([0.0794, 0.0761, 0.1427, 0.0551, 0.1146, 0.1735, 0.1239, 0.3511], device='cuda:1'), in_proj_covar=tensor([0.0317, 0.0387, 0.0369, 0.0345, 0.0381, 0.0280, 0.0356, 0.0374], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 21:56:47,510 INFO [finetune.py:992] (1/2) Epoch 18, batch 7650, loss[loss=0.1547, simple_loss=0.254, pruned_loss=0.02766, over 12188.00 frames. ], tot_loss[loss=0.1638, simple_loss=0.254, pruned_loss=0.03684, over 2374002.82 frames. ], batch size: 35, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 21:56:51,237 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.5856, 5.1284, 5.5462, 4.8320, 5.1988, 4.9698, 5.5782, 5.1912], device='cuda:1'), covar=tensor([0.0296, 0.0436, 0.0296, 0.0265, 0.0393, 0.0345, 0.0204, 0.0236], device='cuda:1'), in_proj_covar=tensor([0.0281, 0.0287, 0.0309, 0.0280, 0.0278, 0.0278, 0.0252, 0.0227], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 21:56:53,136 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=315638.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:57:11,594 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=315664.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:57:22,770 INFO [finetune.py:992] (1/2) Epoch 18, batch 7700, loss[loss=0.1813, simple_loss=0.2765, pruned_loss=0.04298, over 12013.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.2531, pruned_loss=0.03668, over 2372097.15 frames. ], batch size: 40, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 21:57:22,855 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=315680.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:57:27,045 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=315686.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:57:37,195 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.697e+02 2.612e+02 3.026e+02 3.917e+02 7.864e+02, threshold=6.052e+02, percent-clipped=3.0 2023-05-18 21:57:49,043 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.2799, 2.7826, 3.9356, 3.3300, 3.7562, 3.4409, 2.8721, 3.8028], device='cuda:1'), covar=tensor([0.0163, 0.0373, 0.0137, 0.0248, 0.0154, 0.0203, 0.0378, 0.0166], device='cuda:1'), in_proj_covar=tensor([0.0193, 0.0218, 0.0207, 0.0202, 0.0235, 0.0181, 0.0210, 0.0205], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 21:57:58,407 INFO [finetune.py:992] (1/2) Epoch 18, batch 7750, loss[loss=0.1641, simple_loss=0.2552, pruned_loss=0.0365, over 12146.00 frames. ], tot_loss[loss=0.1639, simple_loss=0.2536, pruned_loss=0.03707, over 2367973.78 frames. ], batch size: 34, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 21:57:58,487 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=315730.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:58:01,092 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=315734.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:58:03,932 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=315738.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:58:19,923 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.3329, 4.7879, 2.9964, 2.7987, 4.0217, 2.7090, 4.1101, 3.2873], device='cuda:1'), covar=tensor([0.0831, 0.0464, 0.1192, 0.1550, 0.0363, 0.1423, 0.0480, 0.0892], device='cuda:1'), in_proj_covar=tensor([0.0195, 0.0267, 0.0183, 0.0206, 0.0147, 0.0189, 0.0206, 0.0179], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 21:58:33,086 INFO [finetune.py:992] (1/2) Epoch 18, batch 7800, loss[loss=0.1438, simple_loss=0.2369, pruned_loss=0.02537, over 12255.00 frames. ], tot_loss[loss=0.1636, simple_loss=0.2535, pruned_loss=0.03689, over 2371557.48 frames. ], batch size: 32, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 21:58:46,893 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 2.693e+02 3.248e+02 3.855e+02 6.960e+02, threshold=6.497e+02, percent-clipped=3.0 2023-05-18 21:58:48,977 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.54 vs. limit=2.0 2023-05-18 21:59:07,476 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=315829.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:59:07,958 INFO [finetune.py:992] (1/2) Epoch 18, batch 7850, loss[loss=0.1633, simple_loss=0.2654, pruned_loss=0.03062, over 12153.00 frames. ], tot_loss[loss=0.1639, simple_loss=0.2537, pruned_loss=0.03706, over 2370253.61 frames. ], batch size: 39, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 21:59:41,991 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=315877.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 21:59:44,007 INFO [finetune.py:992] (1/2) Epoch 18, batch 7900, loss[loss=0.1641, simple_loss=0.2439, pruned_loss=0.04211, over 12014.00 frames. ], tot_loss[loss=0.1641, simple_loss=0.2539, pruned_loss=0.03719, over 2367344.09 frames. ], batch size: 28, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 21:59:57,752 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.924e+02 2.526e+02 2.964e+02 3.737e+02 6.219e+02, threshold=5.928e+02, percent-clipped=0.0 2023-05-18 22:00:06,780 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=315913.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:00:09,090 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.14 vs. limit=2.0 2023-05-18 22:00:18,464 INFO [finetune.py:992] (1/2) Epoch 18, batch 7950, loss[loss=0.1707, simple_loss=0.2627, pruned_loss=0.03929, over 12306.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.253, pruned_loss=0.03635, over 2374646.06 frames. ], batch size: 34, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 22:00:27,843 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.1769, 4.5086, 2.6481, 2.4597, 3.8802, 2.5210, 3.9156, 3.0161], device='cuda:1'), covar=tensor([0.0780, 0.0616, 0.1310, 0.1719, 0.0298, 0.1418, 0.0463, 0.0890], device='cuda:1'), in_proj_covar=tensor([0.0194, 0.0266, 0.0182, 0.0205, 0.0146, 0.0188, 0.0205, 0.0178], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 22:00:29,365 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-05-18 22:00:49,339 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=315974.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:00:53,376 INFO [finetune.py:992] (1/2) Epoch 18, batch 8000, loss[loss=0.2793, simple_loss=0.351, pruned_loss=0.1038, over 7757.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2532, pruned_loss=0.03637, over 2369723.80 frames. ], batch size: 97, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 22:00:53,500 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=315980.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:01:08,518 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.851e+02 2.505e+02 2.918e+02 3.507e+02 5.646e+02, threshold=5.837e+02, percent-clipped=0.0 2023-05-18 22:01:21,443 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=316014.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:01:31,082 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=316028.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:01:32,385 INFO [finetune.py:992] (1/2) Epoch 18, batch 8050, loss[loss=0.1481, simple_loss=0.233, pruned_loss=0.03155, over 12358.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2534, pruned_loss=0.03618, over 2371861.12 frames. ], batch size: 30, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 22:01:32,514 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=316030.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:01:38,045 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=316038.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:01:53,328 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.0353, 2.4155, 3.6890, 3.1055, 3.6208, 3.1371, 2.4562, 3.6389], device='cuda:1'), covar=tensor([0.0183, 0.0479, 0.0192, 0.0286, 0.0151, 0.0252, 0.0490, 0.0155], device='cuda:1'), in_proj_covar=tensor([0.0193, 0.0217, 0.0207, 0.0202, 0.0234, 0.0181, 0.0209, 0.0204], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 22:02:03,958 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=316075.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:02:05,941 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=316078.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:02:07,314 INFO [finetune.py:992] (1/2) Epoch 18, batch 8100, loss[loss=0.1566, simple_loss=0.2472, pruned_loss=0.03302, over 12278.00 frames. ], tot_loss[loss=0.1637, simple_loss=0.2541, pruned_loss=0.03662, over 2373759.39 frames. ], batch size: 33, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 22:02:11,429 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=316086.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:02:20,967 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.705e+02 2.492e+02 2.909e+02 3.620e+02 8.244e+02, threshold=5.819e+02, percent-clipped=2.0 2023-05-18 22:02:31,473 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.3048, 2.9217, 2.7957, 2.7772, 2.4889, 2.3480, 2.8399, 2.0372], device='cuda:1'), covar=tensor([0.0444, 0.0227, 0.0255, 0.0241, 0.0418, 0.0385, 0.0205, 0.0521], device='cuda:1'), in_proj_covar=tensor([0.0203, 0.0173, 0.0177, 0.0202, 0.0211, 0.0209, 0.0186, 0.0215], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 22:02:41,678 INFO [finetune.py:992] (1/2) Epoch 18, batch 8150, loss[loss=0.1612, simple_loss=0.253, pruned_loss=0.03469, over 12014.00 frames. ], tot_loss[loss=0.1646, simple_loss=0.255, pruned_loss=0.03713, over 2365019.42 frames. ], batch size: 40, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 22:03:17,442 INFO [finetune.py:992] (1/2) Epoch 18, batch 8200, loss[loss=0.1731, simple_loss=0.2745, pruned_loss=0.03585, over 12033.00 frames. ], tot_loss[loss=0.1648, simple_loss=0.2549, pruned_loss=0.03734, over 2366110.57 frames. ], batch size: 42, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 22:03:25,515 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.63 vs. limit=2.0 2023-05-18 22:03:25,980 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.0832, 4.6895, 5.1154, 4.4714, 4.7387, 4.5614, 5.0985, 4.8414], device='cuda:1'), covar=tensor([0.0316, 0.0433, 0.0268, 0.0284, 0.0442, 0.0369, 0.0209, 0.0366], device='cuda:1'), in_proj_covar=tensor([0.0283, 0.0288, 0.0309, 0.0280, 0.0279, 0.0278, 0.0252, 0.0227], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 22:03:31,289 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.023e+02 2.709e+02 3.241e+02 3.987e+02 7.613e+02, threshold=6.482e+02, percent-clipped=5.0 2023-05-18 22:03:33,208 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.8034, 3.2303, 5.0912, 2.7837, 2.8664, 3.8341, 3.2454, 3.6676], device='cuda:1'), covar=tensor([0.0423, 0.1212, 0.0404, 0.1170, 0.1905, 0.1599, 0.1426, 0.1381], device='cuda:1'), in_proj_covar=tensor([0.0244, 0.0244, 0.0269, 0.0189, 0.0245, 0.0304, 0.0233, 0.0277], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 22:03:52,110 INFO [finetune.py:992] (1/2) Epoch 18, batch 8250, loss[loss=0.1493, simple_loss=0.2459, pruned_loss=0.02632, over 12151.00 frames. ], tot_loss[loss=0.1647, simple_loss=0.2546, pruned_loss=0.03741, over 2365917.97 frames. ], batch size: 36, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 22:04:19,479 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=316269.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:04:26,918 INFO [finetune.py:992] (1/2) Epoch 18, batch 8300, loss[loss=0.1737, simple_loss=0.2671, pruned_loss=0.04018, over 12112.00 frames. ], tot_loss[loss=0.164, simple_loss=0.2535, pruned_loss=0.03726, over 2368258.71 frames. ], batch size: 33, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 22:04:27,392 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=3.44 vs. limit=5.0 2023-05-18 22:04:42,133 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.639e+02 2.605e+02 3.028e+02 3.553e+02 8.114e+02, threshold=6.056e+02, percent-clipped=2.0 2023-05-18 22:05:02,688 INFO [finetune.py:992] (1/2) Epoch 18, batch 8350, loss[loss=0.1597, simple_loss=0.2493, pruned_loss=0.03504, over 12263.00 frames. ], tot_loss[loss=0.1639, simple_loss=0.2536, pruned_loss=0.03715, over 2379563.12 frames. ], batch size: 32, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 22:05:30,351 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=316370.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:05:32,717 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=3.11 vs. limit=5.0 2023-05-18 22:05:37,168 INFO [finetune.py:992] (1/2) Epoch 18, batch 8400, loss[loss=0.1439, simple_loss=0.2352, pruned_loss=0.02635, over 12191.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.2529, pruned_loss=0.03664, over 2378898.37 frames. ], batch size: 31, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 22:05:50,998 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.662e+02 2.560e+02 2.944e+02 3.523e+02 7.440e+02, threshold=5.888e+02, percent-clipped=2.0 2023-05-18 22:06:13,252 INFO [finetune.py:992] (1/2) Epoch 18, batch 8450, loss[loss=0.2359, simple_loss=0.308, pruned_loss=0.08193, over 7926.00 frames. ], tot_loss[loss=0.1641, simple_loss=0.2539, pruned_loss=0.03711, over 2370695.46 frames. ], batch size: 99, lr: 3.22e-03, grad_scale: 8.0 2023-05-18 22:06:33,347 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([6.2141, 6.1762, 5.9500, 5.5248, 5.4656, 6.1163, 5.7328, 5.4842], device='cuda:1'), covar=tensor([0.0612, 0.0774, 0.0616, 0.1805, 0.0754, 0.0686, 0.1486, 0.0958], device='cuda:1'), in_proj_covar=tensor([0.0664, 0.0583, 0.0538, 0.0661, 0.0439, 0.0761, 0.0816, 0.0591], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0004, 0.0002], device='cuda:1') 2023-05-18 22:06:41,763 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.0119, 5.9817, 5.5902, 5.4523, 6.0106, 5.1734, 5.4798, 5.4633], device='cuda:1'), covar=tensor([0.1573, 0.0870, 0.1068, 0.1903, 0.0923, 0.2416, 0.1882, 0.1346], device='cuda:1'), in_proj_covar=tensor([0.0378, 0.0531, 0.0423, 0.0469, 0.0486, 0.0467, 0.0426, 0.0409], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 22:06:48,142 INFO [finetune.py:992] (1/2) Epoch 18, batch 8500, loss[loss=0.1437, simple_loss=0.2294, pruned_loss=0.02906, over 12347.00 frames. ], tot_loss[loss=0.1641, simple_loss=0.2543, pruned_loss=0.03698, over 2374394.64 frames. ], batch size: 31, lr: 3.21e-03, grad_scale: 8.0 2023-05-18 22:07:01,938 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.872e+02 2.609e+02 3.108e+02 3.624e+02 7.556e+02, threshold=6.217e+02, percent-clipped=4.0 2023-05-18 22:07:05,475 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.4586, 2.6444, 3.1514, 4.3130, 2.1779, 4.3302, 4.4352, 4.4322], device='cuda:1'), covar=tensor([0.0169, 0.1317, 0.0544, 0.0159, 0.1589, 0.0260, 0.0149, 0.0120], device='cuda:1'), in_proj_covar=tensor([0.0129, 0.0208, 0.0188, 0.0126, 0.0193, 0.0186, 0.0185, 0.0129], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:1') 2023-05-18 22:07:12,996 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=316516.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:07:14,306 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.0360, 3.8268, 4.0148, 3.6954, 3.8806, 3.7677, 4.0212, 3.6087], device='cuda:1'), covar=tensor([0.0425, 0.0479, 0.0371, 0.0287, 0.0433, 0.0307, 0.0314, 0.1326], device='cuda:1'), in_proj_covar=tensor([0.0284, 0.0289, 0.0311, 0.0281, 0.0280, 0.0278, 0.0254, 0.0228], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 22:07:22,409 INFO [finetune.py:992] (1/2) Epoch 18, batch 8550, loss[loss=0.1665, simple_loss=0.2606, pruned_loss=0.03617, over 11762.00 frames. ], tot_loss[loss=0.1644, simple_loss=0.2546, pruned_loss=0.03707, over 2375108.65 frames. ], batch size: 44, lr: 3.21e-03, grad_scale: 8.0 2023-05-18 22:07:50,318 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=316569.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:07:55,864 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=316577.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:07:57,699 INFO [finetune.py:992] (1/2) Epoch 18, batch 8600, loss[loss=0.1635, simple_loss=0.2512, pruned_loss=0.03791, over 12041.00 frames. ], tot_loss[loss=0.1638, simple_loss=0.254, pruned_loss=0.03683, over 2372714.30 frames. ], batch size: 31, lr: 3.21e-03, grad_scale: 8.0 2023-05-18 22:08:11,460 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.841e+02 2.393e+02 2.785e+02 3.424e+02 7.085e+02, threshold=5.569e+02, percent-clipped=2.0 2023-05-18 22:08:22,310 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.0536, 4.9166, 4.8102, 4.9591, 4.5446, 5.0449, 5.0111, 5.1652], device='cuda:1'), covar=tensor([0.0260, 0.0155, 0.0209, 0.0339, 0.0837, 0.0332, 0.0181, 0.0176], device='cuda:1'), in_proj_covar=tensor([0.0207, 0.0208, 0.0201, 0.0258, 0.0252, 0.0233, 0.0186, 0.0241], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:1') 2023-05-18 22:08:23,637 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=316617.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:08:32,578 INFO [finetune.py:992] (1/2) Epoch 18, batch 8650, loss[loss=0.1477, simple_loss=0.2297, pruned_loss=0.03284, over 12130.00 frames. ], tot_loss[loss=0.1639, simple_loss=0.2539, pruned_loss=0.037, over 2374527.56 frames. ], batch size: 30, lr: 3.21e-03, grad_scale: 8.0 2023-05-18 22:08:47,109 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.37 vs. limit=2.0 2023-05-18 22:09:00,746 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=316670.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:09:07,614 INFO [finetune.py:992] (1/2) Epoch 18, batch 8700, loss[loss=0.1667, simple_loss=0.2616, pruned_loss=0.03594, over 10666.00 frames. ], tot_loss[loss=0.1633, simple_loss=0.2527, pruned_loss=0.03699, over 2375911.09 frames. ], batch size: 68, lr: 3.21e-03, grad_scale: 8.0 2023-05-18 22:09:16,537 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.85 vs. limit=2.0 2023-05-18 22:09:21,420 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.887e+02 2.739e+02 3.075e+02 3.684e+02 6.683e+02, threshold=6.151e+02, percent-clipped=3.0 2023-05-18 22:09:22,477 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.61 vs. limit=2.0 2023-05-18 22:09:29,495 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.95 vs. limit=2.0 2023-05-18 22:09:34,480 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=316718.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:09:34,869 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2023-05-18 22:09:38,782 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.5445, 2.7443, 3.2440, 4.4335, 2.4194, 4.3550, 4.5142, 4.5124], device='cuda:1'), covar=tensor([0.0137, 0.1293, 0.0515, 0.0147, 0.1477, 0.0249, 0.0151, 0.0120], device='cuda:1'), in_proj_covar=tensor([0.0130, 0.0210, 0.0189, 0.0128, 0.0195, 0.0189, 0.0187, 0.0130], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:1') 2023-05-18 22:09:43,361 INFO [finetune.py:992] (1/2) Epoch 18, batch 8750, loss[loss=0.1804, simple_loss=0.281, pruned_loss=0.03994, over 12350.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.2528, pruned_loss=0.03674, over 2382767.81 frames. ], batch size: 36, lr: 3.21e-03, grad_scale: 8.0 2023-05-18 22:10:17,799 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-05-18 22:10:18,001 INFO [finetune.py:992] (1/2) Epoch 18, batch 8800, loss[loss=0.1512, simple_loss=0.2448, pruned_loss=0.02882, over 12287.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2526, pruned_loss=0.03666, over 2381762.78 frames. ], batch size: 34, lr: 3.21e-03, grad_scale: 8.0 2023-05-18 22:10:31,564 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.753e+02 2.658e+02 2.959e+02 3.676e+02 1.888e+03, threshold=5.918e+02, percent-clipped=2.0 2023-05-18 22:10:39,652 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.4153, 4.7874, 2.9264, 2.7549, 4.0338, 2.9200, 4.0818, 3.4491], device='cuda:1'), covar=tensor([0.0679, 0.0545, 0.1180, 0.1526, 0.0371, 0.1164, 0.0446, 0.0712], device='cuda:1'), in_proj_covar=tensor([0.0191, 0.0264, 0.0180, 0.0202, 0.0145, 0.0187, 0.0204, 0.0177], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 22:10:52,450 INFO [finetune.py:992] (1/2) Epoch 18, batch 8850, loss[loss=0.1618, simple_loss=0.258, pruned_loss=0.03285, over 12366.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.253, pruned_loss=0.03689, over 2373915.27 frames. ], batch size: 38, lr: 3.21e-03, grad_scale: 8.0 2023-05-18 22:11:22,793 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=316872.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:11:24,955 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.5135, 2.7499, 3.7026, 4.5657, 3.8449, 4.5720, 3.8400, 3.3206], device='cuda:1'), covar=tensor([0.0042, 0.0417, 0.0162, 0.0048, 0.0142, 0.0084, 0.0151, 0.0381], device='cuda:1'), in_proj_covar=tensor([0.0093, 0.0127, 0.0109, 0.0084, 0.0110, 0.0121, 0.0105, 0.0144], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 22:11:28,238 INFO [finetune.py:992] (1/2) Epoch 18, batch 8900, loss[loss=0.1756, simple_loss=0.2594, pruned_loss=0.04589, over 10513.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2519, pruned_loss=0.03646, over 2378314.34 frames. ], batch size: 68, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:11:42,057 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.606e+02 2.717e+02 3.132e+02 3.823e+02 7.560e+02, threshold=6.264e+02, percent-clipped=1.0 2023-05-18 22:11:43,578 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=316902.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:12:02,467 INFO [finetune.py:992] (1/2) Epoch 18, batch 8950, loss[loss=0.1781, simple_loss=0.2722, pruned_loss=0.04202, over 11621.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2521, pruned_loss=0.03664, over 2377564.02 frames. ], batch size: 48, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:12:07,778 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.43 vs. limit=2.0 2023-05-18 22:12:08,678 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.6274, 4.3036, 4.5970, 4.1399, 4.3401, 4.1649, 4.5783, 4.2184], device='cuda:1'), covar=tensor([0.0323, 0.0377, 0.0306, 0.0278, 0.0380, 0.0338, 0.0264, 0.0652], device='cuda:1'), in_proj_covar=tensor([0.0281, 0.0285, 0.0309, 0.0280, 0.0279, 0.0277, 0.0253, 0.0227], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 22:12:12,723 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.4918, 2.6850, 3.7269, 4.4678, 3.8851, 4.5125, 3.7908, 3.2621], device='cuda:1'), covar=tensor([0.0048, 0.0405, 0.0133, 0.0051, 0.0127, 0.0084, 0.0143, 0.0380], device='cuda:1'), in_proj_covar=tensor([0.0093, 0.0127, 0.0109, 0.0084, 0.0110, 0.0121, 0.0105, 0.0144], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 22:12:23,676 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.8659, 3.2269, 2.4106, 2.2329, 2.9304, 2.3464, 3.0536, 2.6128], device='cuda:1'), covar=tensor([0.0666, 0.0851, 0.1073, 0.1408, 0.0333, 0.1228, 0.0600, 0.0862], device='cuda:1'), in_proj_covar=tensor([0.0192, 0.0264, 0.0181, 0.0203, 0.0145, 0.0187, 0.0204, 0.0177], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 22:12:25,104 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=316963.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:12:29,493 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=2.91 vs. limit=5.0 2023-05-18 22:12:36,656 INFO [finetune.py:992] (1/2) Epoch 18, batch 9000, loss[loss=0.1363, simple_loss=0.2137, pruned_loss=0.02946, over 12009.00 frames. ], tot_loss[loss=0.1643, simple_loss=0.2537, pruned_loss=0.03743, over 2367682.92 frames. ], batch size: 28, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:12:36,656 INFO [finetune.py:1017] (1/2) Computing validation loss 2023-05-18 22:12:48,300 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([1.9510, 3.4830, 3.6940, 4.1671, 2.9605, 3.7844, 2.7157, 3.6925], device='cuda:1'), covar=tensor([0.2073, 0.1091, 0.1138, 0.0533, 0.1359, 0.0758, 0.1932, 0.1363], device='cuda:1'), in_proj_covar=tensor([0.0233, 0.0275, 0.0305, 0.0364, 0.0248, 0.0249, 0.0265, 0.0374], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-18 22:12:54,883 INFO [finetune.py:1026] (1/2) Epoch 18, validation: loss=0.319, simple_loss=0.3929, pruned_loss=0.1225, over 1020973.00 frames. 2023-05-18 22:12:54,884 INFO [finetune.py:1027] (1/2) Maximum memory allocated so far is 12300MB 2023-05-18 22:13:02,185 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.7808, 2.8145, 4.4356, 4.5971, 2.8664, 2.5245, 2.9761, 2.2099], device='cuda:1'), covar=tensor([0.1680, 0.3250, 0.0501, 0.0451, 0.1428, 0.2702, 0.2865, 0.4215], device='cuda:1'), in_proj_covar=tensor([0.0312, 0.0397, 0.0286, 0.0313, 0.0283, 0.0327, 0.0411, 0.0388], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-18 22:13:08,544 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.831e+02 2.672e+02 3.168e+02 4.016e+02 6.334e+02, threshold=6.336e+02, percent-clipped=1.0 2023-05-18 22:13:11,872 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=317004.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:13:18,682 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.3821, 6.2870, 5.8923, 5.9071, 6.3262, 5.5979, 5.8688, 5.8020], device='cuda:1'), covar=tensor([0.1470, 0.0764, 0.0971, 0.1775, 0.0794, 0.2252, 0.1642, 0.1239], device='cuda:1'), in_proj_covar=tensor([0.0377, 0.0531, 0.0423, 0.0469, 0.0483, 0.0465, 0.0426, 0.0410], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 22:13:29,541 INFO [finetune.py:992] (1/2) Epoch 18, batch 9050, loss[loss=0.1374, simple_loss=0.2247, pruned_loss=0.02508, over 12267.00 frames. ], tot_loss[loss=0.165, simple_loss=0.2542, pruned_loss=0.03787, over 2361773.97 frames. ], batch size: 28, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:13:53,798 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=317065.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:14:04,042 INFO [finetune.py:992] (1/2) Epoch 18, batch 9100, loss[loss=0.2109, simple_loss=0.2877, pruned_loss=0.06701, over 8012.00 frames. ], tot_loss[loss=0.1645, simple_loss=0.2543, pruned_loss=0.03734, over 2362458.63 frames. ], batch size: 97, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:14:05,708 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.3552, 2.9765, 2.9131, 2.7664, 2.6448, 2.5655, 2.8928, 1.9798], device='cuda:1'), covar=tensor([0.0438, 0.0228, 0.0217, 0.0270, 0.0380, 0.0295, 0.0181, 0.0539], device='cuda:1'), in_proj_covar=tensor([0.0202, 0.0172, 0.0177, 0.0202, 0.0211, 0.0208, 0.0184, 0.0213], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 22:14:17,888 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.900e+02 2.637e+02 3.104e+02 3.914e+02 6.162e+02, threshold=6.208e+02, percent-clipped=0.0 2023-05-18 22:14:40,034 INFO [finetune.py:992] (1/2) Epoch 18, batch 9150, loss[loss=0.1681, simple_loss=0.2632, pruned_loss=0.03646, over 11619.00 frames. ], tot_loss[loss=0.1637, simple_loss=0.2533, pruned_loss=0.03703, over 2362520.56 frames. ], batch size: 48, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:14:43,793 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.73 vs. limit=5.0 2023-05-18 22:15:09,409 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=317172.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:15:10,576 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.57 vs. limit=5.0 2023-05-18 22:15:15,086 INFO [finetune.py:992] (1/2) Epoch 18, batch 9200, loss[loss=0.1573, simple_loss=0.2572, pruned_loss=0.02869, over 12290.00 frames. ], tot_loss[loss=0.1638, simple_loss=0.2534, pruned_loss=0.0371, over 2370420.25 frames. ], batch size: 37, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:15:25,972 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.18 vs. limit=2.0 2023-05-18 22:15:28,901 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.79 vs. limit=2.0 2023-05-18 22:15:28,998 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.764e+02 2.558e+02 3.038e+02 3.693e+02 1.521e+03, threshold=6.076e+02, percent-clipped=5.0 2023-05-18 22:15:40,858 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.91 vs. limit=5.0 2023-05-18 22:15:43,236 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=317220.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:15:49,942 INFO [finetune.py:992] (1/2) Epoch 18, batch 9250, loss[loss=0.1845, simple_loss=0.2683, pruned_loss=0.05042, over 12201.00 frames. ], tot_loss[loss=0.1636, simple_loss=0.2534, pruned_loss=0.03695, over 2378094.21 frames. ], batch size: 35, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:16:03,968 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=317250.0, num_to_drop=1, layers_to_drop={0} 2023-05-18 22:16:10,548 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=317258.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:16:25,733 INFO [finetune.py:992] (1/2) Epoch 18, batch 9300, loss[loss=0.1641, simple_loss=0.2594, pruned_loss=0.03437, over 12339.00 frames. ], tot_loss[loss=0.1633, simple_loss=0.2534, pruned_loss=0.03665, over 2379345.64 frames. ], batch size: 36, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:16:32,639 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([6.1854, 6.1115, 5.9223, 5.4467, 5.2855, 6.1128, 5.7067, 5.4647], device='cuda:1'), covar=tensor([0.0762, 0.1087, 0.0680, 0.1804, 0.0678, 0.0713, 0.1741, 0.1116], device='cuda:1'), in_proj_covar=tensor([0.0662, 0.0586, 0.0535, 0.0663, 0.0440, 0.0761, 0.0814, 0.0589], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0004, 0.0002], device='cuda:1') 2023-05-18 22:16:38,164 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=317298.0, num_to_drop=1, layers_to_drop={0} 2023-05-18 22:16:39,419 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.791e+02 2.630e+02 3.100e+02 3.705e+02 6.033e+02, threshold=6.201e+02, percent-clipped=0.0 2023-05-18 22:16:47,410 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=317311.0, num_to_drop=1, layers_to_drop={0} 2023-05-18 22:17:02,246 INFO [finetune.py:992] (1/2) Epoch 18, batch 9350, loss[loss=0.1571, simple_loss=0.2515, pruned_loss=0.03129, over 12308.00 frames. ], tot_loss[loss=0.1642, simple_loss=0.2542, pruned_loss=0.03709, over 2373037.19 frames. ], batch size: 34, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:17:22,341 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=317359.0, num_to_drop=1, layers_to_drop={0} 2023-05-18 22:17:22,827 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=317360.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:17:34,775 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.8392, 3.6572, 3.3016, 3.1563, 3.0061, 2.8256, 3.6704, 2.5937], device='cuda:1'), covar=tensor([0.0356, 0.0153, 0.0256, 0.0258, 0.0408, 0.0403, 0.0141, 0.0498], device='cuda:1'), in_proj_covar=tensor([0.0201, 0.0172, 0.0175, 0.0200, 0.0210, 0.0207, 0.0183, 0.0213], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 22:17:36,591 INFO [finetune.py:992] (1/2) Epoch 18, batch 9400, loss[loss=0.1452, simple_loss=0.2263, pruned_loss=0.03203, over 11761.00 frames. ], tot_loss[loss=0.1636, simple_loss=0.2532, pruned_loss=0.03703, over 2381892.43 frames. ], batch size: 26, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:17:48,732 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.4938, 3.2074, 3.0316, 2.9192, 2.6918, 2.5136, 3.2303, 2.1275], device='cuda:1'), covar=tensor([0.0432, 0.0190, 0.0229, 0.0284, 0.0443, 0.0411, 0.0162, 0.0570], device='cuda:1'), in_proj_covar=tensor([0.0201, 0.0172, 0.0175, 0.0201, 0.0210, 0.0207, 0.0183, 0.0213], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 22:17:50,593 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.641e+02 3.089e+02 3.636e+02 5.893e+02, threshold=6.177e+02, percent-clipped=0.0 2023-05-18 22:17:53,113 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.4478, 2.4717, 4.5465, 4.7900, 3.0620, 2.4396, 2.7760, 1.9210], device='cuda:1'), covar=tensor([0.1975, 0.3761, 0.0488, 0.0402, 0.1153, 0.2836, 0.3209, 0.5274], device='cuda:1'), in_proj_covar=tensor([0.0312, 0.0398, 0.0286, 0.0314, 0.0284, 0.0327, 0.0411, 0.0388], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-18 22:18:12,804 INFO [finetune.py:992] (1/2) Epoch 18, batch 9450, loss[loss=0.1344, simple_loss=0.2171, pruned_loss=0.0258, over 12267.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.2527, pruned_loss=0.03684, over 2380270.63 frames. ], batch size: 28, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:18:24,198 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.7484, 3.6185, 3.3237, 3.1036, 2.9514, 2.7106, 3.6670, 2.4696], device='cuda:1'), covar=tensor([0.0360, 0.0143, 0.0175, 0.0253, 0.0387, 0.0380, 0.0112, 0.0503], device='cuda:1'), in_proj_covar=tensor([0.0201, 0.0171, 0.0175, 0.0200, 0.0209, 0.0207, 0.0183, 0.0213], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 22:18:26,585 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.62 vs. limit=2.0 2023-05-18 22:18:33,988 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.5246, 4.5035, 4.5710, 4.6419, 4.3178, 4.3642, 4.2214, 4.5023], device='cuda:1'), covar=tensor([0.1043, 0.0706, 0.1100, 0.0621, 0.1828, 0.1360, 0.0585, 0.1166], device='cuda:1'), in_proj_covar=tensor([0.0569, 0.0748, 0.0651, 0.0662, 0.0896, 0.0778, 0.0592, 0.0509], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0003, 0.0003], device='cuda:1') 2023-05-18 22:18:37,484 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.5232, 2.6027, 3.6230, 4.5772, 3.8619, 4.5122, 3.8333, 3.2635], device='cuda:1'), covar=tensor([0.0047, 0.0438, 0.0167, 0.0038, 0.0145, 0.0089, 0.0146, 0.0384], device='cuda:1'), in_proj_covar=tensor([0.0095, 0.0129, 0.0111, 0.0085, 0.0111, 0.0123, 0.0107, 0.0146], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 22:18:47,725 INFO [finetune.py:992] (1/2) Epoch 18, batch 9500, loss[loss=0.1411, simple_loss=0.2249, pruned_loss=0.02861, over 12180.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.2526, pruned_loss=0.0365, over 2376992.14 frames. ], batch size: 31, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:19:01,322 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.073e+02 2.574e+02 2.985e+02 3.737e+02 6.283e+02, threshold=5.970e+02, percent-clipped=1.0 2023-05-18 22:19:21,831 INFO [finetune.py:992] (1/2) Epoch 18, batch 9550, loss[loss=0.1824, simple_loss=0.2709, pruned_loss=0.04691, over 12275.00 frames. ], tot_loss[loss=0.1639, simple_loss=0.2536, pruned_loss=0.03707, over 2375229.91 frames. ], batch size: 33, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:19:27,638 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.9008, 3.4274, 5.2493, 2.8898, 3.0082, 4.0453, 3.3715, 3.8907], device='cuda:1'), covar=tensor([0.0426, 0.1245, 0.0374, 0.1099, 0.1884, 0.1410, 0.1315, 0.1361], device='cuda:1'), in_proj_covar=tensor([0.0244, 0.0245, 0.0270, 0.0189, 0.0245, 0.0303, 0.0232, 0.0276], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 22:19:42,444 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=317558.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:19:49,084 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.23 vs. limit=2.0 2023-05-18 22:19:57,716 INFO [finetune.py:992] (1/2) Epoch 18, batch 9600, loss[loss=0.1586, simple_loss=0.2547, pruned_loss=0.03122, over 12113.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.2533, pruned_loss=0.03653, over 2380669.52 frames. ], batch size: 33, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:20:05,526 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.77 vs. limit=5.0 2023-05-18 22:20:11,256 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.543e+02 3.144e+02 3.767e+02 6.808e+02, threshold=6.287e+02, percent-clipped=1.0 2023-05-18 22:20:15,781 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=317606.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:20:15,798 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=317606.0, num_to_drop=1, layers_to_drop={0} 2023-05-18 22:20:32,465 INFO [finetune.py:992] (1/2) Epoch 18, batch 9650, loss[loss=0.1553, simple_loss=0.2455, pruned_loss=0.03257, over 12035.00 frames. ], tot_loss[loss=0.1633, simple_loss=0.2536, pruned_loss=0.03647, over 2384937.67 frames. ], batch size: 31, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:20:49,062 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=317654.0, num_to_drop=1, layers_to_drop={3} 2023-05-18 22:20:53,300 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=317660.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:21:07,085 INFO [finetune.py:992] (1/2) Epoch 18, batch 9700, loss[loss=0.1714, simple_loss=0.263, pruned_loss=0.0399, over 12355.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.2533, pruned_loss=0.03648, over 2390418.77 frames. ], batch size: 35, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:21:19,855 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0 2023-05-18 22:21:20,761 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.740e+02 3.173e+02 3.841e+02 6.003e+02, threshold=6.347e+02, percent-clipped=0.0 2023-05-18 22:21:27,535 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=317708.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:21:42,765 INFO [finetune.py:992] (1/2) Epoch 18, batch 9750, loss[loss=0.1789, simple_loss=0.2731, pruned_loss=0.04236, over 10659.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.2533, pruned_loss=0.03651, over 2374767.55 frames. ], batch size: 68, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:22:11,836 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.93 vs. limit=5.0 2023-05-18 22:22:17,563 INFO [finetune.py:992] (1/2) Epoch 18, batch 9800, loss[loss=0.1514, simple_loss=0.2412, pruned_loss=0.03077, over 12088.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.2531, pruned_loss=0.03651, over 2369908.71 frames. ], batch size: 32, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:22:23,254 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.6189, 3.4068, 5.1227, 2.7097, 2.7522, 3.9059, 3.0601, 4.0264], device='cuda:1'), covar=tensor([0.0560, 0.1219, 0.0345, 0.1334, 0.2209, 0.1607, 0.1678, 0.1062], device='cuda:1'), in_proj_covar=tensor([0.0242, 0.0243, 0.0268, 0.0188, 0.0243, 0.0300, 0.0231, 0.0274], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 22:22:25,934 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=317792.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:22:31,284 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.623e+02 3.208e+02 4.036e+02 5.837e+02, threshold=6.416e+02, percent-clipped=0.0 2023-05-18 22:22:52,267 INFO [finetune.py:992] (1/2) Epoch 18, batch 9850, loss[loss=0.1595, simple_loss=0.2535, pruned_loss=0.03274, over 12098.00 frames. ], tot_loss[loss=0.163, simple_loss=0.253, pruned_loss=0.03654, over 2369814.53 frames. ], batch size: 33, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:23:01,478 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.52 vs. limit=2.0 2023-05-18 22:23:09,354 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=317853.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:23:27,240 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.4609, 5.0388, 5.4674, 4.7632, 5.0835, 4.9038, 5.5090, 5.0932], device='cuda:1'), covar=tensor([0.0300, 0.0440, 0.0296, 0.0275, 0.0428, 0.0347, 0.0190, 0.0310], device='cuda:1'), in_proj_covar=tensor([0.0288, 0.0292, 0.0316, 0.0287, 0.0285, 0.0284, 0.0259, 0.0232], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 22:23:27,782 INFO [finetune.py:992] (1/2) Epoch 18, batch 9900, loss[loss=0.1464, simple_loss=0.2313, pruned_loss=0.03077, over 12117.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.2526, pruned_loss=0.03688, over 2367616.95 frames. ], batch size: 30, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:23:30,665 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=317884.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:23:35,495 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([6.1097, 6.0727, 5.8437, 5.3425, 5.2424, 6.0123, 5.6559, 5.4047], device='cuda:1'), covar=tensor([0.0737, 0.0902, 0.0619, 0.1544, 0.0686, 0.0651, 0.1509, 0.1041], device='cuda:1'), in_proj_covar=tensor([0.0663, 0.0588, 0.0536, 0.0665, 0.0443, 0.0765, 0.0815, 0.0593], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0004, 0.0004, 0.0003], device='cuda:1') 2023-05-18 22:23:41,465 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.899e+02 2.738e+02 3.183e+02 3.821e+02 1.015e+03, threshold=6.366e+02, percent-clipped=1.0 2023-05-18 22:23:45,847 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=317906.0, num_to_drop=1, layers_to_drop={2} 2023-05-18 22:23:56,935 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.2238, 2.6423, 3.8971, 3.2309, 3.6207, 3.3835, 2.8635, 3.6723], device='cuda:1'), covar=tensor([0.0168, 0.0432, 0.0214, 0.0313, 0.0182, 0.0220, 0.0405, 0.0180], device='cuda:1'), in_proj_covar=tensor([0.0194, 0.0219, 0.0208, 0.0202, 0.0235, 0.0182, 0.0211, 0.0207], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 22:24:02,193 INFO [finetune.py:992] (1/2) Epoch 18, batch 9950, loss[loss=0.1494, simple_loss=0.2358, pruned_loss=0.0315, over 12031.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2526, pruned_loss=0.03645, over 2374245.17 frames. ], batch size: 31, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:24:11,394 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.7321, 2.3899, 2.8293, 3.6227, 2.2780, 3.7136, 3.7316, 3.8266], device='cuda:1'), covar=tensor([0.0249, 0.1295, 0.0649, 0.0255, 0.1405, 0.0402, 0.0307, 0.0169], device='cuda:1'), in_proj_covar=tensor([0.0129, 0.0208, 0.0189, 0.0127, 0.0192, 0.0188, 0.0186, 0.0130], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:1') 2023-05-18 22:24:12,698 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=317945.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:24:18,955 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=317954.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 22:24:18,988 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=317954.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 22:24:36,619 INFO [finetune.py:992] (1/2) Epoch 18, batch 10000, loss[loss=0.1486, simple_loss=0.2417, pruned_loss=0.02776, over 12305.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.253, pruned_loss=0.03667, over 2372885.16 frames. ], batch size: 34, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:24:39,355 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.6330, 3.3140, 3.0764, 3.0467, 2.7619, 2.6186, 3.3681, 2.2084], device='cuda:1'), covar=tensor([0.0399, 0.0179, 0.0232, 0.0244, 0.0422, 0.0412, 0.0157, 0.0564], device='cuda:1'), in_proj_covar=tensor([0.0205, 0.0176, 0.0179, 0.0205, 0.0214, 0.0211, 0.0188, 0.0217], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 22:24:42,773 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.7036, 2.7784, 4.4476, 4.5530, 2.8395, 2.5557, 2.8134, 2.1525], device='cuda:1'), covar=tensor([0.1709, 0.2860, 0.0478, 0.0459, 0.1373, 0.2595, 0.2939, 0.4123], device='cuda:1'), in_proj_covar=tensor([0.0312, 0.0397, 0.0286, 0.0312, 0.0284, 0.0326, 0.0410, 0.0386], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-18 22:24:48,874 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=2.78 vs. limit=5.0 2023-05-18 22:24:51,148 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.059e+02 2.631e+02 3.060e+02 3.701e+02 7.144e+02, threshold=6.120e+02, percent-clipped=4.0 2023-05-18 22:24:55,600 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=318002.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 22:25:14,819 INFO [finetune.py:992] (1/2) Epoch 18, batch 10050, loss[loss=0.1684, simple_loss=0.263, pruned_loss=0.03694, over 12087.00 frames. ], tot_loss[loss=0.164, simple_loss=0.2537, pruned_loss=0.03717, over 2371679.38 frames. ], batch size: 42, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:25:49,342 INFO [finetune.py:992] (1/2) Epoch 18, batch 10100, loss[loss=0.172, simple_loss=0.2651, pruned_loss=0.03941, over 11371.00 frames. ], tot_loss[loss=0.1648, simple_loss=0.2545, pruned_loss=0.03754, over 2373818.98 frames. ], batch size: 55, lr: 3.21e-03, grad_scale: 16.0 2023-05-18 22:26:03,138 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.820e+02 2.661e+02 3.113e+02 3.876e+02 6.644e+02, threshold=6.226e+02, percent-clipped=1.0 2023-05-18 22:26:08,467 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.45 vs. limit=2.0 2023-05-18 22:26:22,475 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.5377, 3.5090, 3.1833, 3.1189, 2.7765, 2.6439, 3.5159, 2.3972], device='cuda:1'), covar=tensor([0.0431, 0.0201, 0.0241, 0.0260, 0.0514, 0.0456, 0.0160, 0.0514], device='cuda:1'), in_proj_covar=tensor([0.0205, 0.0175, 0.0178, 0.0203, 0.0213, 0.0210, 0.0187, 0.0216], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 22:26:23,636 INFO [finetune.py:992] (1/2) Epoch 18, batch 10150, loss[loss=0.1823, simple_loss=0.2707, pruned_loss=0.04693, over 12133.00 frames. ], tot_loss[loss=0.165, simple_loss=0.255, pruned_loss=0.03757, over 2374825.60 frames. ], batch size: 38, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:26:37,338 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=318148.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:26:59,515 INFO [finetune.py:992] (1/2) Epoch 18, batch 10200, loss[loss=0.1651, simple_loss=0.2547, pruned_loss=0.03773, over 12193.00 frames. ], tot_loss[loss=0.1639, simple_loss=0.254, pruned_loss=0.03695, over 2380466.56 frames. ], batch size: 35, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:27:13,425 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.887e+02 2.573e+02 3.016e+02 3.552e+02 5.688e+02, threshold=6.032e+02, percent-clipped=0.0 2023-05-18 22:27:34,803 INFO [finetune.py:992] (1/2) Epoch 18, batch 10250, loss[loss=0.1608, simple_loss=0.2568, pruned_loss=0.03241, over 12365.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2526, pruned_loss=0.0363, over 2376301.27 frames. ], batch size: 38, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:27:41,956 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=318240.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:28:09,269 INFO [finetune.py:992] (1/2) Epoch 18, batch 10300, loss[loss=0.1785, simple_loss=0.2669, pruned_loss=0.045, over 12135.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.253, pruned_loss=0.03636, over 2384162.33 frames. ], batch size: 34, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:28:24,172 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.654e+02 2.970e+02 3.644e+02 8.965e+02, threshold=5.940e+02, percent-clipped=0.0 2023-05-18 22:28:25,759 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.1750, 3.6340, 3.8131, 4.1583, 2.8456, 3.6561, 2.3414, 3.8239], device='cuda:1'), covar=tensor([0.1699, 0.0880, 0.0978, 0.0713, 0.1222, 0.0716, 0.2074, 0.0883], device='cuda:1'), in_proj_covar=tensor([0.0231, 0.0270, 0.0301, 0.0362, 0.0246, 0.0247, 0.0263, 0.0371], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-18 22:28:31,760 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=318311.0, num_to_drop=1, layers_to_drop={0} 2023-05-18 22:28:44,851 INFO [finetune.py:992] (1/2) Epoch 18, batch 10350, loss[loss=0.1718, simple_loss=0.2691, pruned_loss=0.03731, over 12190.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2532, pruned_loss=0.03642, over 2385054.45 frames. ], batch size: 35, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:28:59,808 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=318352.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:29:10,173 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=318367.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:29:13,768 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=318372.0, num_to_drop=1, layers_to_drop={3} 2023-05-18 22:29:17,875 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.3079, 5.1845, 5.1331, 5.1359, 4.9241, 5.2985, 5.2646, 5.4239], device='cuda:1'), covar=tensor([0.0220, 0.0132, 0.0149, 0.0312, 0.0611, 0.0224, 0.0152, 0.0140], device='cuda:1'), in_proj_covar=tensor([0.0208, 0.0210, 0.0201, 0.0259, 0.0252, 0.0233, 0.0187, 0.0242], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:1') 2023-05-18 22:29:19,082 INFO [finetune.py:992] (1/2) Epoch 18, batch 10400, loss[loss=0.1772, simple_loss=0.2703, pruned_loss=0.04199, over 11665.00 frames. ], tot_loss[loss=0.1636, simple_loss=0.2536, pruned_loss=0.03681, over 2377236.38 frames. ], batch size: 48, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:29:32,857 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.064e+02 2.765e+02 3.191e+02 3.806e+02 7.414e+02, threshold=6.382e+02, percent-clipped=4.0 2023-05-18 22:29:42,297 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=318413.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:29:53,314 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=318428.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:29:54,552 INFO [finetune.py:992] (1/2) Epoch 18, batch 10450, loss[loss=0.1371, simple_loss=0.218, pruned_loss=0.02814, over 12350.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.2531, pruned_loss=0.03664, over 2377792.41 frames. ], batch size: 30, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:29:58,172 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.5313, 5.1133, 5.5371, 4.8778, 5.1240, 4.9787, 5.5871, 5.1014], device='cuda:1'), covar=tensor([0.0270, 0.0371, 0.0285, 0.0262, 0.0396, 0.0342, 0.0195, 0.0279], device='cuda:1'), in_proj_covar=tensor([0.0287, 0.0290, 0.0313, 0.0285, 0.0283, 0.0282, 0.0257, 0.0231], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 22:30:07,976 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=318448.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:30:29,649 INFO [finetune.py:992] (1/2) Epoch 18, batch 10500, loss[loss=0.1684, simple_loss=0.2596, pruned_loss=0.03865, over 11638.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2529, pruned_loss=0.03662, over 2387756.79 frames. ], batch size: 48, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:30:32,805 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.4632, 3.3571, 4.8518, 2.4723, 2.7982, 3.6178, 3.2030, 3.7247], device='cuda:1'), covar=tensor([0.0479, 0.1130, 0.0359, 0.1320, 0.1960, 0.1632, 0.1359, 0.1206], device='cuda:1'), in_proj_covar=tensor([0.0241, 0.0242, 0.0268, 0.0187, 0.0243, 0.0300, 0.0230, 0.0273], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 22:30:40,865 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=318496.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:30:43,543 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.590e+02 3.071e+02 3.704e+02 6.715e+02, threshold=6.142e+02, percent-clipped=1.0 2023-05-18 22:30:43,776 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.1284, 2.4542, 3.7674, 3.0601, 3.5727, 3.2678, 2.5922, 3.6561], device='cuda:1'), covar=tensor([0.0159, 0.0437, 0.0157, 0.0303, 0.0161, 0.0219, 0.0423, 0.0147], device='cuda:1'), in_proj_covar=tensor([0.0196, 0.0221, 0.0210, 0.0204, 0.0237, 0.0183, 0.0212, 0.0209], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 22:31:04,509 INFO [finetune.py:992] (1/2) Epoch 18, batch 10550, loss[loss=0.1378, simple_loss=0.2293, pruned_loss=0.02312, over 12010.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2531, pruned_loss=0.03641, over 2382733.80 frames. ], batch size: 31, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:31:11,648 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=318540.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:31:40,722 INFO [finetune.py:992] (1/2) Epoch 18, batch 10600, loss[loss=0.1739, simple_loss=0.2642, pruned_loss=0.04184, over 12111.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2526, pruned_loss=0.03586, over 2389537.39 frames. ], batch size: 39, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:31:46,283 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=318588.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:31:52,590 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.5715, 5.4233, 5.5576, 5.5776, 5.1819, 5.2858, 5.0546, 5.4752], device='cuda:1'), covar=tensor([0.0746, 0.0629, 0.0772, 0.0599, 0.1892, 0.1237, 0.0501, 0.1156], device='cuda:1'), in_proj_covar=tensor([0.0568, 0.0751, 0.0653, 0.0662, 0.0895, 0.0783, 0.0597, 0.0513], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:1') 2023-05-18 22:31:54,441 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 2.614e+02 2.999e+02 3.518e+02 7.847e+02, threshold=5.998e+02, percent-clipped=4.0 2023-05-18 22:32:02,253 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.8523, 4.7261, 4.8354, 4.8898, 4.4985, 4.6339, 4.4190, 4.7310], device='cuda:1'), covar=tensor([0.0873, 0.0709, 0.0985, 0.0665, 0.2054, 0.1305, 0.0620, 0.1294], device='cuda:1'), in_proj_covar=tensor([0.0568, 0.0751, 0.0653, 0.0662, 0.0896, 0.0784, 0.0597, 0.0513], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:1') 2023-05-18 22:32:14,207 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.5619, 2.3308, 3.7829, 4.5143, 3.9473, 4.4292, 3.8575, 3.1802], device='cuda:1'), covar=tensor([0.0043, 0.0480, 0.0147, 0.0056, 0.0136, 0.0090, 0.0171, 0.0403], device='cuda:1'), in_proj_covar=tensor([0.0093, 0.0126, 0.0108, 0.0084, 0.0110, 0.0121, 0.0106, 0.0144], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 22:32:16,147 INFO [finetune.py:992] (1/2) Epoch 18, batch 10650, loss[loss=0.1739, simple_loss=0.2607, pruned_loss=0.04356, over 12311.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.252, pruned_loss=0.03594, over 2386516.81 frames. ], batch size: 34, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:32:35,441 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.5995, 4.4922, 4.4668, 4.4499, 4.1938, 4.6134, 4.5712, 4.7322], device='cuda:1'), covar=tensor([0.0257, 0.0191, 0.0202, 0.0446, 0.0740, 0.0531, 0.0202, 0.0242], device='cuda:1'), in_proj_covar=tensor([0.0208, 0.0210, 0.0201, 0.0258, 0.0251, 0.0234, 0.0187, 0.0243], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:1') 2023-05-18 22:32:41,495 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=318667.0, num_to_drop=1, layers_to_drop={2} 2023-05-18 22:32:50,188 INFO [finetune.py:992] (1/2) Epoch 18, batch 10700, loss[loss=0.1952, simple_loss=0.293, pruned_loss=0.04871, over 12188.00 frames. ], tot_loss[loss=0.1635, simple_loss=0.2538, pruned_loss=0.03657, over 2378964.10 frames. ], batch size: 35, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:33:04,730 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.903e+02 2.617e+02 3.214e+02 3.905e+02 7.925e+02, threshold=6.428e+02, percent-clipped=1.0 2023-05-18 22:33:06,907 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.3577, 5.1940, 5.2676, 5.3408, 4.9275, 5.0150, 4.7860, 5.3014], device='cuda:1'), covar=tensor([0.0749, 0.0729, 0.0987, 0.0663, 0.2119, 0.1383, 0.0624, 0.1156], device='cuda:1'), in_proj_covar=tensor([0.0570, 0.0751, 0.0652, 0.0661, 0.0895, 0.0782, 0.0597, 0.0513], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:1') 2023-05-18 22:33:10,407 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=318708.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:33:17,800 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.37 vs. limit=2.0 2023-05-18 22:33:19,362 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.9256, 4.6045, 4.7250, 4.8451, 4.5641, 4.8717, 4.7811, 2.5648], device='cuda:1'), covar=tensor([0.0100, 0.0077, 0.0100, 0.0063, 0.0060, 0.0106, 0.0077, 0.0867], device='cuda:1'), in_proj_covar=tensor([0.0074, 0.0083, 0.0088, 0.0077, 0.0064, 0.0099, 0.0086, 0.0103], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 22:33:21,150 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=318723.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:33:25,807 INFO [finetune.py:992] (1/2) Epoch 18, batch 10750, loss[loss=0.1403, simple_loss=0.2317, pruned_loss=0.02441, over 12183.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2524, pruned_loss=0.03599, over 2386353.68 frames. ], batch size: 31, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:33:27,394 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=318732.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:33:39,198 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.5035, 2.2444, 2.9933, 2.6368, 2.8476, 2.8633, 2.1948, 2.9757], device='cuda:1'), covar=tensor([0.0161, 0.0366, 0.0193, 0.0276, 0.0183, 0.0198, 0.0355, 0.0166], device='cuda:1'), in_proj_covar=tensor([0.0194, 0.0218, 0.0208, 0.0201, 0.0234, 0.0181, 0.0210, 0.0208], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 22:34:00,316 INFO [finetune.py:992] (1/2) Epoch 18, batch 10800, loss[loss=0.1741, simple_loss=0.2711, pruned_loss=0.03853, over 12348.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.2531, pruned_loss=0.03622, over 2380104.89 frames. ], batch size: 36, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:34:09,745 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.6395, 3.5692, 3.1857, 3.1716, 2.8528, 2.6772, 3.5928, 2.3822], device='cuda:1'), covar=tensor([0.0372, 0.0141, 0.0213, 0.0237, 0.0399, 0.0412, 0.0148, 0.0487], device='cuda:1'), in_proj_covar=tensor([0.0203, 0.0174, 0.0176, 0.0202, 0.0211, 0.0209, 0.0185, 0.0214], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 22:34:09,758 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=318793.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:34:14,288 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.967e+02 2.748e+02 3.175e+02 3.838e+02 7.308e+02, threshold=6.350e+02, percent-clipped=1.0 2023-05-18 22:34:35,442 INFO [finetune.py:992] (1/2) Epoch 18, batch 10850, loss[loss=0.1685, simple_loss=0.2606, pruned_loss=0.03819, over 12354.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2521, pruned_loss=0.03601, over 2384541.73 frames. ], batch size: 36, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:35:12,115 INFO [finetune.py:992] (1/2) Epoch 18, batch 10900, loss[loss=0.1385, simple_loss=0.2309, pruned_loss=0.02307, over 12358.00 frames. ], tot_loss[loss=0.1635, simple_loss=0.2532, pruned_loss=0.03689, over 2383124.47 frames. ], batch size: 31, lr: 3.20e-03, grad_scale: 32.0 2023-05-18 22:35:25,755 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.796e+02 2.603e+02 3.099e+02 4.010e+02 6.423e+02, threshold=6.198e+02, percent-clipped=1.0 2023-05-18 22:35:45,197 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.1927, 4.9036, 5.1405, 5.1328, 4.8712, 5.1666, 5.0993, 2.9292], device='cuda:1'), covar=tensor([0.0103, 0.0071, 0.0075, 0.0062, 0.0052, 0.0092, 0.0071, 0.0706], device='cuda:1'), in_proj_covar=tensor([0.0074, 0.0083, 0.0088, 0.0077, 0.0064, 0.0098, 0.0085, 0.0102], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 22:35:46,279 INFO [finetune.py:992] (1/2) Epoch 18, batch 10950, loss[loss=0.1706, simple_loss=0.2605, pruned_loss=0.04039, over 11594.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.2531, pruned_loss=0.03686, over 2381187.73 frames. ], batch size: 48, lr: 3.20e-03, grad_scale: 32.0 2023-05-18 22:36:04,293 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.01 vs. limit=5.0 2023-05-18 22:36:11,754 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=318967.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 22:36:20,711 INFO [finetune.py:992] (1/2) Epoch 18, batch 11000, loss[loss=0.1821, simple_loss=0.2764, pruned_loss=0.04385, over 12143.00 frames. ], tot_loss[loss=0.1661, simple_loss=0.2559, pruned_loss=0.03817, over 2356659.67 frames. ], batch size: 38, lr: 3.20e-03, grad_scale: 32.0 2023-05-18 22:36:35,487 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.719e+02 3.351e+02 4.549e+02 6.103e+02, threshold=6.702e+02, percent-clipped=0.0 2023-05-18 22:36:41,467 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=319008.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:36:46,105 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=319015.0, num_to_drop=1, layers_to_drop={0} 2023-05-18 22:36:51,627 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=319023.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:36:56,343 INFO [finetune.py:992] (1/2) Epoch 18, batch 11050, loss[loss=0.2144, simple_loss=0.2889, pruned_loss=0.06991, over 8075.00 frames. ], tot_loss[loss=0.168, simple_loss=0.2575, pruned_loss=0.03926, over 2308851.19 frames. ], batch size: 97, lr: 3.20e-03, grad_scale: 32.0 2023-05-18 22:37:13,989 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=319056.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:37:24,070 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=319071.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:37:27,474 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=319076.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:37:30,555 INFO [finetune.py:992] (1/2) Epoch 18, batch 11100, loss[loss=0.2568, simple_loss=0.3465, pruned_loss=0.08358, over 10213.00 frames. ], tot_loss[loss=0.1724, simple_loss=0.2617, pruned_loss=0.04158, over 2271473.96 frames. ], batch size: 68, lr: 3.20e-03, grad_scale: 32.0 2023-05-18 22:37:32,914 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.1228, 4.5297, 4.1246, 4.8909, 4.3885, 2.7840, 4.1176, 2.8796], device='cuda:1'), covar=tensor([0.1025, 0.0802, 0.1482, 0.0562, 0.1331, 0.1909, 0.1252, 0.3674], device='cuda:1'), in_proj_covar=tensor([0.0313, 0.0382, 0.0364, 0.0341, 0.0375, 0.0276, 0.0352, 0.0369], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 22:37:36,093 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=319088.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:37:44,870 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.279e+02 3.031e+02 3.506e+02 4.333e+02 9.817e+02, threshold=7.011e+02, percent-clipped=7.0 2023-05-18 22:38:01,190 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.7251, 2.1627, 2.9579, 3.7248, 2.2865, 3.6755, 3.4212, 3.7659], device='cuda:1'), covar=tensor([0.0148, 0.1411, 0.0459, 0.0157, 0.1288, 0.0214, 0.0350, 0.0141], device='cuda:1'), in_proj_covar=tensor([0.0127, 0.0204, 0.0184, 0.0125, 0.0189, 0.0185, 0.0182, 0.0128], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 22:38:04,817 INFO [finetune.py:992] (1/2) Epoch 18, batch 11150, loss[loss=0.2103, simple_loss=0.302, pruned_loss=0.0593, over 10486.00 frames. ], tot_loss[loss=0.1789, simple_loss=0.2679, pruned_loss=0.045, over 2218599.21 frames. ], batch size: 69, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:38:09,991 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=319137.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:38:11,670 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.19 vs. limit=2.0 2023-05-18 22:38:12,187 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.1665, 1.8841, 2.2566, 2.1350, 2.1774, 2.3410, 1.8230, 2.2543], device='cuda:1'), covar=tensor([0.0123, 0.0318, 0.0151, 0.0196, 0.0160, 0.0157, 0.0311, 0.0143], device='cuda:1'), in_proj_covar=tensor([0.0191, 0.0215, 0.0204, 0.0197, 0.0230, 0.0179, 0.0206, 0.0204], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 22:38:24,242 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=319158.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 22:38:31,825 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.4952, 4.9847, 5.4691, 4.7849, 5.0292, 4.8730, 5.4775, 5.1602], device='cuda:1'), covar=tensor([0.0295, 0.0474, 0.0297, 0.0282, 0.0491, 0.0404, 0.0249, 0.0255], device='cuda:1'), in_proj_covar=tensor([0.0284, 0.0286, 0.0310, 0.0281, 0.0279, 0.0279, 0.0255, 0.0228], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 22:38:40,221 INFO [finetune.py:992] (1/2) Epoch 18, batch 11200, loss[loss=0.3028, simple_loss=0.3713, pruned_loss=0.1172, over 7382.00 frames. ], tot_loss[loss=0.1849, simple_loss=0.273, pruned_loss=0.04841, over 2163507.86 frames. ], batch size: 99, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:38:54,759 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.127e+02 3.242e+02 4.075e+02 4.801e+02 8.019e+02, threshold=8.151e+02, percent-clipped=3.0 2023-05-18 22:39:07,036 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=319219.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 22:39:14,145 INFO [finetune.py:992] (1/2) Epoch 18, batch 11250, loss[loss=0.2131, simple_loss=0.3049, pruned_loss=0.06061, over 10431.00 frames. ], tot_loss[loss=0.1928, simple_loss=0.2795, pruned_loss=0.05306, over 2073027.20 frames. ], batch size: 68, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:39:16,576 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.34 vs. limit=2.0 2023-05-18 22:39:48,365 INFO [finetune.py:992] (1/2) Epoch 18, batch 11300, loss[loss=0.23, simple_loss=0.3114, pruned_loss=0.07429, over 7057.00 frames. ], tot_loss[loss=0.1992, simple_loss=0.2851, pruned_loss=0.05668, over 2026493.12 frames. ], batch size: 100, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:39:50,708 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.83 vs. limit=2.0 2023-05-18 22:40:02,531 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.321e+02 3.438e+02 4.108e+02 5.089e+02 1.496e+03, threshold=8.216e+02, percent-clipped=3.0 2023-05-18 22:40:18,622 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.45 vs. limit=5.0 2023-05-18 22:40:22,912 INFO [finetune.py:992] (1/2) Epoch 18, batch 11350, loss[loss=0.255, simple_loss=0.3382, pruned_loss=0.08585, over 6917.00 frames. ], tot_loss[loss=0.2036, simple_loss=0.2893, pruned_loss=0.05892, over 1987713.57 frames. ], batch size: 98, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:40:55,862 INFO [finetune.py:992] (1/2) Epoch 18, batch 11400, loss[loss=0.2499, simple_loss=0.3246, pruned_loss=0.08759, over 7339.00 frames. ], tot_loss[loss=0.2096, simple_loss=0.294, pruned_loss=0.06263, over 1912719.42 frames. ], batch size: 99, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:41:01,847 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=319388.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:41:10,628 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.040e+02 3.596e+02 4.227e+02 4.901e+02 9.608e+02, threshold=8.454e+02, percent-clipped=1.0 2023-05-18 22:41:29,394 INFO [finetune.py:992] (1/2) Epoch 18, batch 11450, loss[loss=0.2078, simple_loss=0.2946, pruned_loss=0.06051, over 10311.00 frames. ], tot_loss[loss=0.2137, simple_loss=0.2973, pruned_loss=0.0651, over 1872638.30 frames. ], batch size: 68, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:41:30,862 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=319432.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:41:33,437 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=319436.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:41:48,441 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.1250, 5.8301, 5.4828, 5.4501, 5.9353, 5.1752, 5.3486, 5.4094], device='cuda:1'), covar=tensor([0.1375, 0.0892, 0.1012, 0.1577, 0.0782, 0.2188, 0.1785, 0.1024], device='cuda:1'), in_proj_covar=tensor([0.0364, 0.0514, 0.0410, 0.0451, 0.0467, 0.0448, 0.0406, 0.0398], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 22:42:03,449 INFO [finetune.py:992] (1/2) Epoch 18, batch 11500, loss[loss=0.2228, simple_loss=0.3088, pruned_loss=0.06834, over 10297.00 frames. ], tot_loss[loss=0.2167, simple_loss=0.2997, pruned_loss=0.06687, over 1853469.45 frames. ], batch size: 68, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:42:17,237 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.452e+02 3.355e+02 4.114e+02 5.171e+02 1.226e+03, threshold=8.229e+02, percent-clipped=1.0 2023-05-18 22:42:27,196 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=319514.0, num_to_drop=1, layers_to_drop={3} 2023-05-18 22:42:37,515 INFO [finetune.py:992] (1/2) Epoch 18, batch 11550, loss[loss=0.2454, simple_loss=0.312, pruned_loss=0.08941, over 7204.00 frames. ], tot_loss[loss=0.2199, simple_loss=0.3019, pruned_loss=0.06898, over 1815180.09 frames. ], batch size: 101, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:43:11,002 INFO [finetune.py:992] (1/2) Epoch 18, batch 11600, loss[loss=0.2295, simple_loss=0.3132, pruned_loss=0.07292, over 7491.00 frames. ], tot_loss[loss=0.2223, simple_loss=0.3035, pruned_loss=0.07056, over 1791385.91 frames. ], batch size: 101, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:43:25,090 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.352e+02 3.315e+02 3.958e+02 4.558e+02 7.057e+02, threshold=7.916e+02, percent-clipped=0.0 2023-05-18 22:43:46,160 INFO [finetune.py:992] (1/2) Epoch 18, batch 11650, loss[loss=0.2698, simple_loss=0.321, pruned_loss=0.1093, over 6349.00 frames. ], tot_loss[loss=0.2225, simple_loss=0.3029, pruned_loss=0.07104, over 1770138.00 frames. ], batch size: 99, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:44:12,855 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.6893, 4.5609, 4.5531, 4.6025, 4.2293, 4.7149, 4.6408, 4.7569], device='cuda:1'), covar=tensor([0.0232, 0.0151, 0.0163, 0.0351, 0.0674, 0.0288, 0.0171, 0.0213], device='cuda:1'), in_proj_covar=tensor([0.0189, 0.0191, 0.0183, 0.0234, 0.0229, 0.0211, 0.0170, 0.0223], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:1') 2023-05-18 22:44:19,792 INFO [finetune.py:992] (1/2) Epoch 18, batch 11700, loss[loss=0.2101, simple_loss=0.3024, pruned_loss=0.0589, over 11144.00 frames. ], tot_loss[loss=0.224, simple_loss=0.3036, pruned_loss=0.0722, over 1734693.74 frames. ], batch size: 55, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:44:34,859 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.533e+02 3.366e+02 3.819e+02 4.451e+02 1.022e+03, threshold=7.638e+02, percent-clipped=0.0 2023-05-18 22:44:48,339 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.94 vs. limit=2.0 2023-05-18 22:44:53,823 INFO [finetune.py:992] (1/2) Epoch 18, batch 11750, loss[loss=0.2665, simple_loss=0.3267, pruned_loss=0.1032, over 6633.00 frames. ], tot_loss[loss=0.2239, simple_loss=0.3033, pruned_loss=0.07227, over 1719461.09 frames. ], batch size: 98, lr: 3.20e-03, grad_scale: 16.0 2023-05-18 22:44:55,319 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=319732.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:45:16,991 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=319763.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:45:28,122 INFO [finetune.py:992] (1/2) Epoch 18, batch 11800, loss[loss=0.2564, simple_loss=0.3256, pruned_loss=0.0936, over 6894.00 frames. ], tot_loss[loss=0.2254, simple_loss=0.3051, pruned_loss=0.07291, over 1727610.82 frames. ], batch size: 99, lr: 3.19e-03, grad_scale: 16.0 2023-05-18 22:45:28,208 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=319780.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:45:42,171 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.295e+02 3.502e+02 3.958e+02 4.763e+02 8.177e+02, threshold=7.917e+02, percent-clipped=2.0 2023-05-18 22:45:51,492 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=319814.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 22:45:58,609 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=319824.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:46:02,323 INFO [finetune.py:992] (1/2) Epoch 18, batch 11850, loss[loss=0.192, simple_loss=0.2848, pruned_loss=0.04963, over 11007.00 frames. ], tot_loss[loss=0.2251, simple_loss=0.3054, pruned_loss=0.0724, over 1723151.16 frames. ], batch size: 55, lr: 3.19e-03, grad_scale: 16.0 2023-05-18 22:46:11,862 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=319844.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:46:23,511 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=319862.0, num_to_drop=1, layers_to_drop={0} 2023-05-18 22:46:28,731 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.5981, 4.5708, 4.5856, 4.6584, 4.3628, 4.3840, 4.2843, 4.5730], device='cuda:1'), covar=tensor([0.0815, 0.0608, 0.0924, 0.0582, 0.1751, 0.1292, 0.0557, 0.1018], device='cuda:1'), in_proj_covar=tensor([0.0535, 0.0695, 0.0611, 0.0616, 0.0828, 0.0730, 0.0557, 0.0478], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0003, 0.0003], device='cuda:1') 2023-05-18 22:46:36,335 INFO [finetune.py:992] (1/2) Epoch 18, batch 11900, loss[loss=0.2683, simple_loss=0.3386, pruned_loss=0.099, over 6782.00 frames. ], tot_loss[loss=0.2247, simple_loss=0.3056, pruned_loss=0.07196, over 1711392.59 frames. ], batch size: 99, lr: 3.19e-03, grad_scale: 16.0 2023-05-18 22:46:49,990 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.213e+02 3.218e+02 3.820e+02 4.516e+02 7.088e+02, threshold=7.640e+02, percent-clipped=0.0 2023-05-18 22:46:52,745 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=319905.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:47:08,938 INFO [finetune.py:992] (1/2) Epoch 18, batch 11950, loss[loss=0.2236, simple_loss=0.3006, pruned_loss=0.07333, over 7372.00 frames. ], tot_loss[loss=0.2207, simple_loss=0.3023, pruned_loss=0.06949, over 1700797.27 frames. ], batch size: 98, lr: 3.19e-03, grad_scale: 16.0 2023-05-18 22:47:43,130 INFO [finetune.py:992] (1/2) Epoch 18, batch 12000, loss[loss=0.1757, simple_loss=0.2729, pruned_loss=0.03926, over 10380.00 frames. ], tot_loss[loss=0.2157, simple_loss=0.2982, pruned_loss=0.06657, over 1679383.06 frames. ], batch size: 69, lr: 3.19e-03, grad_scale: 16.0 2023-05-18 22:47:43,130 INFO [finetune.py:1017] (1/2) Computing validation loss 2023-05-18 22:47:50,226 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.4517, 4.1676, 4.3268, 4.3303, 4.1569, 4.2566, 4.3956, 2.4228], device='cuda:1'), covar=tensor([0.0131, 0.0110, 0.0132, 0.0105, 0.0081, 0.0181, 0.0146, 0.1133], device='cuda:1'), in_proj_covar=tensor([0.0071, 0.0080, 0.0085, 0.0074, 0.0061, 0.0094, 0.0082, 0.0099], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 22:47:59,356 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.1226, 4.7323, 3.0161, 2.2664, 4.2173, 2.4885, 4.1813, 3.0557], device='cuda:1'), covar=tensor([0.0839, 0.0191, 0.1320, 0.2538, 0.0133, 0.1743, 0.0336, 0.1083], device='cuda:1'), in_proj_covar=tensor([0.0184, 0.0250, 0.0174, 0.0197, 0.0139, 0.0182, 0.0193, 0.0172], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 22:48:01,395 INFO [finetune.py:1026] (1/2) Epoch 18, validation: loss=0.2892, simple_loss=0.3621, pruned_loss=0.1082, over 1020973.00 frames. 2023-05-18 22:48:01,396 INFO [finetune.py:1027] (1/2) Maximum memory allocated so far is 12411MB 2023-05-18 22:48:08,801 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.9669, 3.8381, 3.9173, 3.6949, 3.8394, 3.7439, 3.9204, 3.5931], device='cuda:1'), covar=tensor([0.0398, 0.0354, 0.0329, 0.0261, 0.0390, 0.0299, 0.0302, 0.1706], device='cuda:1'), in_proj_covar=tensor([0.0268, 0.0271, 0.0292, 0.0267, 0.0266, 0.0263, 0.0241, 0.0216], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 22:48:18,103 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.962e+02 2.845e+02 3.398e+02 3.978e+02 8.382e+02, threshold=6.795e+02, percent-clipped=2.0 2023-05-18 22:48:37,109 INFO [finetune.py:992] (1/2) Epoch 18, batch 12050, loss[loss=0.2019, simple_loss=0.2815, pruned_loss=0.06111, over 6898.00 frames. ], tot_loss[loss=0.2112, simple_loss=0.2945, pruned_loss=0.06393, over 1685274.28 frames. ], batch size: 99, lr: 3.19e-03, grad_scale: 16.0 2023-05-18 22:48:57,534 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=320060.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:49:07,700 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.27 vs. limit=2.0 2023-05-18 22:49:09,762 INFO [finetune.py:992] (1/2) Epoch 18, batch 12100, loss[loss=0.2116, simple_loss=0.2899, pruned_loss=0.0667, over 7512.00 frames. ], tot_loss[loss=0.2097, simple_loss=0.2938, pruned_loss=0.06281, over 1692355.18 frames. ], batch size: 98, lr: 3.19e-03, grad_scale: 16.0 2023-05-18 22:49:15,050 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.8128, 2.5912, 4.0093, 4.2240, 2.9605, 2.6341, 2.7543, 2.1826], device='cuda:1'), covar=tensor([0.1621, 0.3234, 0.0495, 0.0384, 0.1165, 0.2753, 0.3141, 0.4814], device='cuda:1'), in_proj_covar=tensor([0.0304, 0.0385, 0.0274, 0.0299, 0.0273, 0.0317, 0.0400, 0.0376], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-18 22:49:22,979 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.061e+02 2.905e+02 3.486e+02 4.017e+02 7.717e+02, threshold=6.973e+02, percent-clipped=2.0 2023-05-18 22:49:34,056 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=320119.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:49:35,420 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=320121.0, num_to_drop=1, layers_to_drop={3} 2023-05-18 22:49:40,768 INFO [finetune.py:992] (1/2) Epoch 18, batch 12150, loss[loss=0.2342, simple_loss=0.3104, pruned_loss=0.07903, over 6818.00 frames. ], tot_loss[loss=0.2112, simple_loss=0.2954, pruned_loss=0.06345, over 1709209.43 frames. ], batch size: 98, lr: 3.19e-03, grad_scale: 16.0 2023-05-18 22:50:00,011 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.33 vs. limit=2.0 2023-05-18 22:50:01,608 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=320163.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:50:05,266 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.6861, 3.4890, 3.5870, 3.7425, 3.4583, 3.7822, 3.7715, 3.7882], device='cuda:1'), covar=tensor([0.0238, 0.0177, 0.0177, 0.0284, 0.0550, 0.0339, 0.0217, 0.0233], device='cuda:1'), in_proj_covar=tensor([0.0184, 0.0186, 0.0177, 0.0226, 0.0223, 0.0205, 0.0165, 0.0218], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:1') 2023-05-18 22:50:11,723 INFO [finetune.py:992] (1/2) Epoch 18, batch 12200, loss[loss=0.2306, simple_loss=0.3018, pruned_loss=0.07969, over 7199.00 frames. ], tot_loss[loss=0.2119, simple_loss=0.2959, pruned_loss=0.0639, over 1685602.47 frames. ], batch size: 101, lr: 3.19e-03, grad_scale: 16.0 2023-05-18 22:50:24,564 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=320200.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:50:25,055 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.291e+02 3.335e+02 3.821e+02 4.545e+02 1.101e+03, threshold=7.643e+02, percent-clipped=3.0 2023-05-18 22:50:52,876 INFO [finetune.py:992] (1/2) Epoch 19, batch 0, loss[loss=0.1845, simple_loss=0.2759, pruned_loss=0.04655, over 11655.00 frames. ], tot_loss[loss=0.1845, simple_loss=0.2759, pruned_loss=0.04655, over 11655.00 frames. ], batch size: 48, lr: 3.19e-03, grad_scale: 16.0 2023-05-18 22:50:52,877 INFO [finetune.py:1017] (1/2) Computing validation loss 2023-05-18 22:51:09,217 INFO [finetune.py:1026] (1/2) Epoch 19, validation: loss=0.2843, simple_loss=0.3593, pruned_loss=0.1047, over 1020973.00 frames. 2023-05-18 22:51:09,218 INFO [finetune.py:1027] (1/2) Maximum memory allocated so far is 12411MB 2023-05-18 22:51:16,402 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=320224.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:51:42,544 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=320261.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 22:51:44,540 INFO [finetune.py:992] (1/2) Epoch 19, batch 50, loss[loss=0.133, simple_loss=0.2227, pruned_loss=0.0216, over 12359.00 frames. ], tot_loss[loss=0.1699, simple_loss=0.2608, pruned_loss=0.03944, over 544959.20 frames. ], batch size: 30, lr: 3.19e-03, grad_scale: 16.0 2023-05-18 22:51:44,924 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.17 vs. limit=2.0 2023-05-18 22:52:11,005 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.801e+02 2.864e+02 3.286e+02 3.794e+02 9.424e+02, threshold=6.572e+02, percent-clipped=1.0 2023-05-18 22:52:19,938 INFO [finetune.py:992] (1/2) Epoch 19, batch 100, loss[loss=0.1717, simple_loss=0.2659, pruned_loss=0.03881, over 12185.00 frames. ], tot_loss[loss=0.1681, simple_loss=0.26, pruned_loss=0.03813, over 953581.06 frames. ], batch size: 35, lr: 3.19e-03, grad_scale: 16.0 2023-05-18 22:52:25,589 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=320322.0, num_to_drop=1, layers_to_drop={3} 2023-05-18 22:52:54,374 INFO [finetune.py:992] (1/2) Epoch 19, batch 150, loss[loss=0.1571, simple_loss=0.2458, pruned_loss=0.03424, over 12189.00 frames. ], tot_loss[loss=0.1677, simple_loss=0.2589, pruned_loss=0.03827, over 1269650.21 frames. ], batch size: 31, lr: 3.19e-03, grad_scale: 16.0 2023-05-18 22:53:06,722 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.44 vs. limit=2.0 2023-05-18 22:53:09,631 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.7601, 4.6875, 4.6718, 4.7833, 3.6593, 4.8312, 4.7781, 4.9175], device='cuda:1'), covar=tensor([0.0334, 0.0278, 0.0233, 0.0376, 0.1490, 0.0397, 0.0265, 0.0269], device='cuda:1'), in_proj_covar=tensor([0.0191, 0.0193, 0.0184, 0.0235, 0.0231, 0.0212, 0.0171, 0.0225], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:1') 2023-05-18 22:53:19,989 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.659e+02 2.635e+02 3.023e+02 3.573e+02 9.596e+02, threshold=6.047e+02, percent-clipped=2.0 2023-05-18 22:53:22,363 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.4034, 4.8576, 3.2384, 2.8742, 4.1593, 2.7350, 4.0702, 3.4609], device='cuda:1'), covar=tensor([0.0773, 0.0478, 0.1130, 0.1635, 0.0301, 0.1480, 0.0540, 0.0878], device='cuda:1'), in_proj_covar=tensor([0.0185, 0.0252, 0.0175, 0.0199, 0.0140, 0.0185, 0.0195, 0.0174], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 22:53:29,038 INFO [finetune.py:992] (1/2) Epoch 19, batch 200, loss[loss=0.1703, simple_loss=0.2602, pruned_loss=0.0402, over 12083.00 frames. ], tot_loss[loss=0.1669, simple_loss=0.2577, pruned_loss=0.03808, over 1508525.15 frames. ], batch size: 32, lr: 3.19e-03, grad_scale: 16.0 2023-05-18 22:53:31,175 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=320416.0, num_to_drop=1, layers_to_drop={2} 2023-05-18 22:53:33,273 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=320419.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:53:36,047 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([6.0343, 6.0149, 5.7745, 5.3565, 5.2774, 5.9577, 5.5975, 5.4001], device='cuda:1'), covar=tensor([0.0718, 0.0949, 0.0662, 0.1772, 0.0774, 0.0721, 0.1533, 0.1090], device='cuda:1'), in_proj_covar=tensor([0.0638, 0.0572, 0.0516, 0.0638, 0.0427, 0.0731, 0.0774, 0.0570], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002], device='cuda:1') 2023-05-18 22:53:44,286 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.1342, 4.8256, 4.9571, 4.9599, 4.8462, 5.0625, 4.9403, 2.8026], device='cuda:1'), covar=tensor([0.0089, 0.0072, 0.0083, 0.0067, 0.0050, 0.0136, 0.0078, 0.0874], device='cuda:1'), in_proj_covar=tensor([0.0071, 0.0081, 0.0086, 0.0075, 0.0062, 0.0096, 0.0083, 0.0101], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 22:54:00,327 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.1953, 4.7078, 5.1492, 4.4018, 4.7125, 4.5525, 5.2071, 4.8957], device='cuda:1'), covar=tensor([0.0377, 0.0560, 0.0384, 0.0334, 0.0549, 0.0449, 0.0237, 0.0388], device='cuda:1'), in_proj_covar=tensor([0.0270, 0.0271, 0.0293, 0.0269, 0.0267, 0.0266, 0.0243, 0.0218], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 22:54:04,945 INFO [finetune.py:992] (1/2) Epoch 19, batch 250, loss[loss=0.144, simple_loss=0.2288, pruned_loss=0.02965, over 12349.00 frames. ], tot_loss[loss=0.1659, simple_loss=0.2567, pruned_loss=0.0375, over 1708458.06 frames. ], batch size: 31, lr: 3.19e-03, grad_scale: 16.0 2023-05-18 22:54:05,829 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.5895, 2.5746, 4.5348, 4.7408, 2.6876, 2.3980, 2.7949, 2.0294], device='cuda:1'), covar=tensor([0.1871, 0.3478, 0.0529, 0.0438, 0.1468, 0.3072, 0.3568, 0.4821], device='cuda:1'), in_proj_covar=tensor([0.0307, 0.0391, 0.0278, 0.0304, 0.0276, 0.0322, 0.0406, 0.0382], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-18 22:54:06,927 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=320467.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:54:29,948 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=320500.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:54:31,284 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.798e+02 2.586e+02 2.848e+02 3.553e+02 7.290e+02, threshold=5.696e+02, percent-clipped=1.0 2023-05-18 22:54:39,770 INFO [finetune.py:992] (1/2) Epoch 19, batch 300, loss[loss=0.1494, simple_loss=0.2357, pruned_loss=0.03152, over 12286.00 frames. ], tot_loss[loss=0.1643, simple_loss=0.2553, pruned_loss=0.03661, over 1870011.62 frames. ], batch size: 33, lr: 3.19e-03, grad_scale: 8.0 2023-05-18 22:54:43,236 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=320519.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:54:57,883 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.5850, 4.4765, 4.4352, 4.4514, 4.1807, 4.5717, 4.4999, 4.7545], device='cuda:1'), covar=tensor([0.0264, 0.0196, 0.0221, 0.0444, 0.0765, 0.0414, 0.0217, 0.0222], device='cuda:1'), in_proj_covar=tensor([0.0195, 0.0197, 0.0188, 0.0240, 0.0236, 0.0217, 0.0175, 0.0230], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:1') 2023-05-18 22:55:01,559 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.50 vs. limit=2.0 2023-05-18 22:55:03,343 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=320548.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:55:14,342 INFO [finetune.py:992] (1/2) Epoch 19, batch 350, loss[loss=0.167, simple_loss=0.2703, pruned_loss=0.03187, over 12196.00 frames. ], tot_loss[loss=0.1642, simple_loss=0.2551, pruned_loss=0.03668, over 1980677.98 frames. ], batch size: 35, lr: 3.19e-03, grad_scale: 8.0 2023-05-18 22:55:16,562 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.7857, 2.2612, 3.2817, 3.7392, 3.4001, 3.8130, 3.4114, 2.7575], device='cuda:1'), covar=tensor([0.0059, 0.0453, 0.0138, 0.0063, 0.0135, 0.0081, 0.0134, 0.0415], device='cuda:1'), in_proj_covar=tensor([0.0089, 0.0121, 0.0102, 0.0080, 0.0104, 0.0116, 0.0101, 0.0137], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-18 22:55:22,969 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.18 vs. limit=2.0 2023-05-18 22:55:42,319 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.731e+02 2.625e+02 3.106e+02 3.632e+02 9.872e+02, threshold=6.212e+02, percent-clipped=3.0 2023-05-18 22:55:50,489 INFO [finetune.py:992] (1/2) Epoch 19, batch 400, loss[loss=0.183, simple_loss=0.2721, pruned_loss=0.04692, over 12275.00 frames. ], tot_loss[loss=0.1639, simple_loss=0.2548, pruned_loss=0.03649, over 2072648.36 frames. ], batch size: 37, lr: 3.19e-03, grad_scale: 8.0 2023-05-18 22:55:52,580 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=320617.0, num_to_drop=1, layers_to_drop={2} 2023-05-18 22:56:19,767 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.8482, 2.8782, 4.4324, 4.6606, 2.8392, 2.5630, 2.8272, 2.2050], device='cuda:1'), covar=tensor([0.1663, 0.3076, 0.0519, 0.0428, 0.1382, 0.2680, 0.3105, 0.4198], device='cuda:1'), in_proj_covar=tensor([0.0309, 0.0393, 0.0280, 0.0305, 0.0278, 0.0324, 0.0408, 0.0384], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-18 22:56:25,036 INFO [finetune.py:992] (1/2) Epoch 19, batch 450, loss[loss=0.1458, simple_loss=0.231, pruned_loss=0.03031, over 11999.00 frames. ], tot_loss[loss=0.1649, simple_loss=0.2558, pruned_loss=0.03698, over 2137974.44 frames. ], batch size: 28, lr: 3.19e-03, grad_scale: 8.0 2023-05-18 22:56:26,629 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=320666.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:56:51,274 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.037e+02 2.664e+02 3.120e+02 3.783e+02 1.584e+03, threshold=6.241e+02, percent-clipped=3.0 2023-05-18 22:56:59,592 INFO [finetune.py:992] (1/2) Epoch 19, batch 500, loss[loss=0.1626, simple_loss=0.2587, pruned_loss=0.03325, over 12307.00 frames. ], tot_loss[loss=0.1651, simple_loss=0.2558, pruned_loss=0.03715, over 2197949.45 frames. ], batch size: 34, lr: 3.19e-03, grad_scale: 8.0 2023-05-18 22:57:01,097 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=320716.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:57:09,302 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=320727.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:57:35,293 INFO [finetune.py:992] (1/2) Epoch 19, batch 550, loss[loss=0.1434, simple_loss=0.2245, pruned_loss=0.03118, over 11994.00 frames. ], tot_loss[loss=0.1644, simple_loss=0.255, pruned_loss=0.03691, over 2234414.05 frames. ], batch size: 28, lr: 3.19e-03, grad_scale: 8.0 2023-05-18 22:57:35,361 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=320764.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:58:00,595 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.4673, 3.5170, 3.1358, 3.0407, 2.8362, 2.6046, 3.4174, 2.3598], device='cuda:1'), covar=tensor([0.0478, 0.0164, 0.0225, 0.0257, 0.0435, 0.0511, 0.0174, 0.0586], device='cuda:1'), in_proj_covar=tensor([0.0197, 0.0168, 0.0169, 0.0196, 0.0204, 0.0203, 0.0179, 0.0209], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 22:58:01,760 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.916e+02 2.446e+02 2.837e+02 3.398e+02 2.517e+03, threshold=5.673e+02, percent-clipped=2.0 2023-05-18 22:58:10,064 INFO [finetune.py:992] (1/2) Epoch 19, batch 600, loss[loss=0.1551, simple_loss=0.2528, pruned_loss=0.02868, over 12307.00 frames. ], tot_loss[loss=0.1635, simple_loss=0.254, pruned_loss=0.03645, over 2262710.02 frames. ], batch size: 34, lr: 3.19e-03, grad_scale: 8.0 2023-05-18 22:58:13,650 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=320819.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:58:45,545 INFO [finetune.py:992] (1/2) Epoch 19, batch 650, loss[loss=0.1466, simple_loss=0.2312, pruned_loss=0.03104, over 12136.00 frames. ], tot_loss[loss=0.1638, simple_loss=0.2544, pruned_loss=0.03657, over 2282377.68 frames. ], batch size: 30, lr: 3.19e-03, grad_scale: 8.0 2023-05-18 22:58:45,756 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.2957, 4.6437, 3.0137, 2.6644, 4.1094, 2.4459, 3.9330, 3.0756], device='cuda:1'), covar=tensor([0.0748, 0.0628, 0.1133, 0.1660, 0.0338, 0.1599, 0.0513, 0.0943], device='cuda:1'), in_proj_covar=tensor([0.0188, 0.0257, 0.0177, 0.0202, 0.0142, 0.0186, 0.0199, 0.0176], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 22:58:48,289 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=320867.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 22:58:53,292 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.6868, 4.2781, 4.0618, 4.4181, 3.3685, 4.1106, 2.8888, 4.4207], device='cuda:1'), covar=tensor([0.1226, 0.0575, 0.1253, 0.0949, 0.0979, 0.0544, 0.1581, 0.1222], device='cuda:1'), in_proj_covar=tensor([0.0231, 0.0270, 0.0299, 0.0358, 0.0246, 0.0245, 0.0265, 0.0369], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-18 22:58:58,888 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.0927, 4.6830, 4.7198, 4.9317, 4.8227, 4.8782, 4.8671, 2.5089], device='cuda:1'), covar=tensor([0.0100, 0.0076, 0.0109, 0.0064, 0.0049, 0.0119, 0.0086, 0.0972], device='cuda:1'), in_proj_covar=tensor([0.0072, 0.0082, 0.0087, 0.0076, 0.0063, 0.0097, 0.0084, 0.0103], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 22:59:12,577 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.739e+02 2.680e+02 3.217e+02 3.943e+02 5.915e+02, threshold=6.434e+02, percent-clipped=1.0 2023-05-18 22:59:15,689 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.0189, 3.5818, 5.3084, 2.7913, 2.8206, 3.7481, 3.1926, 3.8750], device='cuda:1'), covar=tensor([0.0351, 0.1152, 0.0263, 0.1233, 0.2150, 0.1722, 0.1484, 0.1266], device='cuda:1'), in_proj_covar=tensor([0.0237, 0.0239, 0.0259, 0.0186, 0.0239, 0.0292, 0.0226, 0.0267], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 22:59:20,807 INFO [finetune.py:992] (1/2) Epoch 19, batch 700, loss[loss=0.1687, simple_loss=0.258, pruned_loss=0.03972, over 12297.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.253, pruned_loss=0.03636, over 2316568.23 frames. ], batch size: 34, lr: 3.19e-03, grad_scale: 8.0 2023-05-18 22:59:23,100 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=320917.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 22:59:55,582 INFO [finetune.py:992] (1/2) Epoch 19, batch 750, loss[loss=0.1673, simple_loss=0.258, pruned_loss=0.03837, over 12365.00 frames. ], tot_loss[loss=0.1636, simple_loss=0.254, pruned_loss=0.03661, over 2330648.83 frames. ], batch size: 36, lr: 3.19e-03, grad_scale: 8.0 2023-05-18 22:59:56,319 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=320965.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 23:00:21,784 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.600e+02 2.715e+02 3.173e+02 3.808e+02 6.144e+02, threshold=6.346e+02, percent-clipped=0.0 2023-05-18 23:00:30,890 INFO [finetune.py:992] (1/2) Epoch 19, batch 800, loss[loss=0.1679, simple_loss=0.2614, pruned_loss=0.03722, over 12264.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2534, pruned_loss=0.03617, over 2349995.13 frames. ], batch size: 37, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:00:36,494 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=321022.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:01:05,662 INFO [finetune.py:992] (1/2) Epoch 19, batch 850, loss[loss=0.1474, simple_loss=0.2331, pruned_loss=0.03082, over 12190.00 frames. ], tot_loss[loss=0.1639, simple_loss=0.2547, pruned_loss=0.03653, over 2343862.00 frames. ], batch size: 31, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:01:23,666 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.9781, 4.6129, 4.6040, 4.7797, 4.6636, 4.8357, 4.7689, 2.5621], device='cuda:1'), covar=tensor([0.0119, 0.0079, 0.0120, 0.0074, 0.0057, 0.0105, 0.0078, 0.0965], device='cuda:1'), in_proj_covar=tensor([0.0072, 0.0082, 0.0087, 0.0076, 0.0063, 0.0097, 0.0084, 0.0102], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 23:01:31,910 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.562e+02 2.652e+02 3.102e+02 3.680e+02 5.824e+02, threshold=6.205e+02, percent-clipped=0.0 2023-05-18 23:01:40,217 INFO [finetune.py:992] (1/2) Epoch 19, batch 900, loss[loss=0.1847, simple_loss=0.2778, pruned_loss=0.04583, over 11102.00 frames. ], tot_loss[loss=0.1635, simple_loss=0.2543, pruned_loss=0.0364, over 2357075.13 frames. ], batch size: 55, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:01:44,134 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.17 vs. limit=2.0 2023-05-18 23:01:48,195 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=321125.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:01:48,801 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=321126.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:02:13,094 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=321160.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:02:15,735 INFO [finetune.py:992] (1/2) Epoch 19, batch 950, loss[loss=0.1435, simple_loss=0.2266, pruned_loss=0.03022, over 12178.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2533, pruned_loss=0.03622, over 2356983.63 frames. ], batch size: 31, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:02:31,611 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=321186.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:02:32,343 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=321187.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:02:42,845 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.921e+02 2.656e+02 3.176e+02 3.656e+02 5.814e+02, threshold=6.353e+02, percent-clipped=0.0 2023-05-18 23:02:51,126 INFO [finetune.py:992] (1/2) Epoch 19, batch 1000, loss[loss=0.1535, simple_loss=0.2377, pruned_loss=0.03465, over 12244.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2533, pruned_loss=0.03606, over 2365046.76 frames. ], batch size: 32, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:02:56,226 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=321221.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:03:05,820 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.5047, 3.7282, 3.2132, 3.1238, 2.9588, 2.7019, 3.5691, 2.3434], device='cuda:1'), covar=tensor([0.0444, 0.0140, 0.0214, 0.0261, 0.0417, 0.0458, 0.0171, 0.0557], device='cuda:1'), in_proj_covar=tensor([0.0199, 0.0169, 0.0172, 0.0198, 0.0207, 0.0207, 0.0181, 0.0212], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 23:03:08,460 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.2644, 6.1974, 5.7838, 5.7219, 6.2804, 5.4767, 5.8117, 5.7615], device='cuda:1'), covar=tensor([0.1503, 0.0896, 0.1147, 0.1858, 0.0865, 0.2336, 0.1673, 0.1205], device='cuda:1'), in_proj_covar=tensor([0.0365, 0.0516, 0.0416, 0.0456, 0.0471, 0.0451, 0.0406, 0.0395], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 23:03:25,696 INFO [finetune.py:992] (1/2) Epoch 19, batch 1050, loss[loss=0.1456, simple_loss=0.2275, pruned_loss=0.03185, over 12349.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2528, pruned_loss=0.03607, over 2371959.80 frames. ], batch size: 30, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:03:30,977 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.94 vs. limit=2.0 2023-05-18 23:03:47,405 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.0175, 5.9658, 5.5583, 5.5712, 6.0700, 5.3015, 5.4770, 5.5955], device='cuda:1'), covar=tensor([0.1726, 0.0947, 0.0973, 0.1849, 0.0858, 0.2503, 0.1989, 0.1140], device='cuda:1'), in_proj_covar=tensor([0.0367, 0.0517, 0.0416, 0.0456, 0.0472, 0.0452, 0.0406, 0.0396], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 23:03:52,215 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.627e+02 2.582e+02 2.917e+02 3.353e+02 7.382e+02, threshold=5.833e+02, percent-clipped=1.0 2023-05-18 23:04:01,505 INFO [finetune.py:992] (1/2) Epoch 19, batch 1100, loss[loss=0.1373, simple_loss=0.2228, pruned_loss=0.02596, over 12365.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.2517, pruned_loss=0.03575, over 2371362.10 frames. ], batch size: 30, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:04:07,825 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=321322.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:04:09,939 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=321325.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:04:24,613 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.51 vs. limit=2.0 2023-05-18 23:04:36,695 INFO [finetune.py:992] (1/2) Epoch 19, batch 1150, loss[loss=0.1384, simple_loss=0.2243, pruned_loss=0.02627, over 12121.00 frames. ], tot_loss[loss=0.1613, simple_loss=0.2514, pruned_loss=0.03565, over 2380617.61 frames. ], batch size: 30, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:04:40,984 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=321370.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:04:51,998 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=321386.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:05:03,335 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.095e+02 2.618e+02 3.087e+02 3.775e+02 7.185e+02, threshold=6.175e+02, percent-clipped=1.0 2023-05-18 23:05:11,867 INFO [finetune.py:992] (1/2) Epoch 19, batch 1200, loss[loss=0.1731, simple_loss=0.2601, pruned_loss=0.043, over 12285.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.2514, pruned_loss=0.03585, over 2380757.45 frames. ], batch size: 33, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:05:16,196 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.91 vs. limit=2.0 2023-05-18 23:05:18,686 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=321424.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:05:46,539 INFO [finetune.py:992] (1/2) Epoch 19, batch 1250, loss[loss=0.1909, simple_loss=0.2787, pruned_loss=0.05158, over 12107.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.2514, pruned_loss=0.03594, over 2386631.29 frames. ], batch size: 39, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:05:53,669 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=321473.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:05:59,155 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=321481.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:05:59,851 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=321482.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:06:01,973 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=321485.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:06:13,536 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.547e+02 2.888e+02 3.407e+02 5.181e+02, threshold=5.777e+02, percent-clipped=0.0 2023-05-18 23:06:21,797 INFO [finetune.py:992] (1/2) Epoch 19, batch 1300, loss[loss=0.1546, simple_loss=0.2517, pruned_loss=0.02872, over 12025.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2524, pruned_loss=0.03636, over 2378933.03 frames. ], batch size: 31, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:06:23,347 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=321516.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:06:35,925 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=321534.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:06:56,528 INFO [finetune.py:992] (1/2) Epoch 19, batch 1350, loss[loss=0.1708, simple_loss=0.2644, pruned_loss=0.03863, over 12272.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.2516, pruned_loss=0.03598, over 2384586.59 frames. ], batch size: 37, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:07:23,243 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.881e+02 2.752e+02 3.072e+02 3.598e+02 5.405e+02, threshold=6.145e+02, percent-clipped=0.0 2023-05-18 23:07:32,273 INFO [finetune.py:992] (1/2) Epoch 19, batch 1400, loss[loss=0.1342, simple_loss=0.2188, pruned_loss=0.02477, over 12051.00 frames. ], tot_loss[loss=0.162, simple_loss=0.252, pruned_loss=0.03594, over 2387280.33 frames. ], batch size: 28, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:08:04,672 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.4366, 5.3031, 5.3522, 5.4479, 5.0101, 5.1355, 4.8390, 5.3485], device='cuda:1'), covar=tensor([0.0838, 0.0641, 0.0855, 0.0592, 0.2090, 0.1390, 0.0560, 0.1233], device='cuda:1'), in_proj_covar=tensor([0.0565, 0.0730, 0.0639, 0.0645, 0.0872, 0.0763, 0.0583, 0.0506], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0003, 0.0003], device='cuda:1') 2023-05-18 23:08:08,023 INFO [finetune.py:992] (1/2) Epoch 19, batch 1450, loss[loss=0.1535, simple_loss=0.2554, pruned_loss=0.02581, over 12361.00 frames. ], tot_loss[loss=0.1608, simple_loss=0.2511, pruned_loss=0.03524, over 2392070.24 frames. ], batch size: 36, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:08:19,623 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=321681.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:08:34,283 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.750e+02 2.607e+02 3.008e+02 3.618e+02 6.815e+02, threshold=6.016e+02, percent-clipped=1.0 2023-05-18 23:08:42,652 INFO [finetune.py:992] (1/2) Epoch 19, batch 1500, loss[loss=0.1467, simple_loss=0.2467, pruned_loss=0.0234, over 12307.00 frames. ], tot_loss[loss=0.1613, simple_loss=0.2513, pruned_loss=0.03563, over 2387786.23 frames. ], batch size: 34, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:09:18,021 INFO [finetune.py:992] (1/2) Epoch 19, batch 1550, loss[loss=0.1688, simple_loss=0.2692, pruned_loss=0.03422, over 12127.00 frames. ], tot_loss[loss=0.1609, simple_loss=0.2509, pruned_loss=0.03548, over 2381783.33 frames. ], batch size: 39, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:09:25,211 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.1378, 4.7575, 4.8294, 4.9452, 4.8093, 5.0558, 4.8621, 2.8991], device='cuda:1'), covar=tensor([0.0141, 0.0084, 0.0113, 0.0072, 0.0053, 0.0096, 0.0105, 0.0799], device='cuda:1'), in_proj_covar=tensor([0.0072, 0.0082, 0.0087, 0.0076, 0.0063, 0.0097, 0.0084, 0.0101], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 23:09:29,283 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=321780.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:09:29,661 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.81 vs. limit=5.0 2023-05-18 23:09:29,997 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=321781.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:09:31,353 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=321782.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:09:45,112 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.788e+02 2.545e+02 2.978e+02 3.691e+02 1.085e+03, threshold=5.957e+02, percent-clipped=1.0 2023-05-18 23:09:53,449 INFO [finetune.py:992] (1/2) Epoch 19, batch 1600, loss[loss=0.1468, simple_loss=0.2438, pruned_loss=0.02485, over 12361.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2516, pruned_loss=0.03561, over 2377280.21 frames. ], batch size: 35, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:09:54,977 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=321816.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:10:04,363 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=321829.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:10:04,381 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=321829.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:10:05,066 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=321830.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:10:08,875 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.93 vs. limit=5.0 2023-05-18 23:10:28,092 INFO [finetune.py:992] (1/2) Epoch 19, batch 1650, loss[loss=0.1502, simple_loss=0.2388, pruned_loss=0.03084, over 12263.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.252, pruned_loss=0.03572, over 2379969.05 frames. ], batch size: 32, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:10:28,150 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=321864.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:10:46,314 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.9364, 3.5870, 5.3239, 2.8258, 3.0461, 3.9747, 3.5078, 3.9136], device='cuda:1'), covar=tensor([0.0431, 0.1133, 0.0341, 0.1187, 0.1931, 0.1543, 0.1292, 0.1325], device='cuda:1'), in_proj_covar=tensor([0.0240, 0.0242, 0.0265, 0.0188, 0.0241, 0.0297, 0.0229, 0.0273], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 23:10:54,130 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.819e+02 2.848e+02 3.238e+02 3.715e+02 9.115e+02, threshold=6.477e+02, percent-clipped=3.0 2023-05-18 23:11:03,299 INFO [finetune.py:992] (1/2) Epoch 19, batch 1700, loss[loss=0.1768, simple_loss=0.265, pruned_loss=0.04432, over 12025.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2526, pruned_loss=0.03607, over 2369214.00 frames. ], batch size: 40, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:11:19,613 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=321936.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:11:38,943 INFO [finetune.py:992] (1/2) Epoch 19, batch 1750, loss[loss=0.1419, simple_loss=0.2357, pruned_loss=0.02402, over 12132.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.252, pruned_loss=0.0356, over 2370119.61 frames. ], batch size: 33, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:11:50,952 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=321981.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:12:02,158 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=321997.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:12:08,156 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.996e+02 2.590e+02 2.927e+02 3.533e+02 7.060e+02, threshold=5.854e+02, percent-clipped=1.0 2023-05-18 23:12:16,656 INFO [finetune.py:992] (1/2) Epoch 19, batch 1800, loss[loss=0.1622, simple_loss=0.2573, pruned_loss=0.0336, over 11682.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2528, pruned_loss=0.036, over 2370315.98 frames. ], batch size: 48, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:12:27,062 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=322029.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:12:47,310 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.7570, 2.7925, 4.6924, 4.9540, 3.1275, 2.6711, 3.0106, 2.1993], device='cuda:1'), covar=tensor([0.1672, 0.3162, 0.0418, 0.0362, 0.1149, 0.2573, 0.2990, 0.4141], device='cuda:1'), in_proj_covar=tensor([0.0314, 0.0398, 0.0283, 0.0311, 0.0283, 0.0327, 0.0412, 0.0387], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-18 23:12:52,429 INFO [finetune.py:992] (1/2) Epoch 19, batch 1850, loss[loss=0.1343, simple_loss=0.2212, pruned_loss=0.0237, over 12347.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.2536, pruned_loss=0.03641, over 2369991.51 frames. ], batch size: 31, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:13:03,746 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=322080.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:13:18,673 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.638e+02 2.489e+02 2.964e+02 3.499e+02 6.491e+02, threshold=5.927e+02, percent-clipped=1.0 2023-05-18 23:13:26,990 INFO [finetune.py:992] (1/2) Epoch 19, batch 1900, loss[loss=0.1786, simple_loss=0.2733, pruned_loss=0.04202, over 12050.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.2536, pruned_loss=0.03628, over 2376378.53 frames. ], batch size: 40, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:13:36,598 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=322128.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:13:37,334 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=322129.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:14:01,591 INFO [finetune.py:992] (1/2) Epoch 19, batch 1950, loss[loss=0.1545, simple_loss=0.2575, pruned_loss=0.02572, over 11347.00 frames. ], tot_loss[loss=0.1638, simple_loss=0.2544, pruned_loss=0.03661, over 2373751.16 frames. ], batch size: 55, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:14:10,541 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=322177.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:14:28,144 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.612e+02 3.062e+02 3.546e+02 5.710e+02, threshold=6.124e+02, percent-clipped=0.0 2023-05-18 23:14:37,916 INFO [finetune.py:992] (1/2) Epoch 19, batch 2000, loss[loss=0.17, simple_loss=0.2626, pruned_loss=0.03867, over 12359.00 frames. ], tot_loss[loss=0.1636, simple_loss=0.2541, pruned_loss=0.03659, over 2370959.36 frames. ], batch size: 36, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:14:38,656 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([6.0606, 5.9887, 5.9741, 5.1463, 5.2332, 6.0932, 5.2479, 5.4521], device='cuda:1'), covar=tensor([0.1135, 0.1408, 0.0987, 0.2894, 0.0995, 0.1181, 0.3183, 0.1937], device='cuda:1'), in_proj_covar=tensor([0.0662, 0.0597, 0.0543, 0.0671, 0.0446, 0.0767, 0.0819, 0.0594], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0004, 0.0004, 0.0002], device='cuda:1') 2023-05-18 23:15:13,352 INFO [finetune.py:992] (1/2) Epoch 19, batch 2050, loss[loss=0.1647, simple_loss=0.2499, pruned_loss=0.0398, over 12290.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2526, pruned_loss=0.03608, over 2379023.04 frames. ], batch size: 33, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:15:29,503 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.6335, 2.7436, 3.7769, 4.6425, 4.0081, 4.6380, 3.8964, 3.3573], device='cuda:1'), covar=tensor([0.0043, 0.0397, 0.0129, 0.0053, 0.0116, 0.0078, 0.0141, 0.0341], device='cuda:1'), in_proj_covar=tensor([0.0092, 0.0124, 0.0106, 0.0083, 0.0106, 0.0119, 0.0104, 0.0140], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 23:15:32,913 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=322292.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:15:39,799 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.843e+02 2.505e+02 2.981e+02 3.557e+02 8.250e+02, threshold=5.962e+02, percent-clipped=1.0 2023-05-18 23:15:48,269 INFO [finetune.py:992] (1/2) Epoch 19, batch 2100, loss[loss=0.1534, simple_loss=0.2503, pruned_loss=0.02827, over 12359.00 frames. ], tot_loss[loss=0.1613, simple_loss=0.2514, pruned_loss=0.03558, over 2378892.54 frames. ], batch size: 35, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:16:07,435 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.34 vs. limit=2.0 2023-05-18 23:16:24,390 INFO [finetune.py:992] (1/2) Epoch 19, batch 2150, loss[loss=0.1406, simple_loss=0.2242, pruned_loss=0.02849, over 12341.00 frames. ], tot_loss[loss=0.1607, simple_loss=0.2508, pruned_loss=0.03529, over 2372061.00 frames. ], batch size: 30, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:16:50,909 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.665e+02 2.497e+02 2.936e+02 3.608e+02 4.796e+02, threshold=5.872e+02, percent-clipped=0.0 2023-05-18 23:16:59,365 INFO [finetune.py:992] (1/2) Epoch 19, batch 2200, loss[loss=0.1543, simple_loss=0.2548, pruned_loss=0.02692, over 12342.00 frames. ], tot_loss[loss=0.1607, simple_loss=0.2511, pruned_loss=0.03517, over 2372138.53 frames. ], batch size: 36, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:17:33,625 INFO [finetune.py:992] (1/2) Epoch 19, batch 2250, loss[loss=0.1489, simple_loss=0.2361, pruned_loss=0.03084, over 11991.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2519, pruned_loss=0.03539, over 2373249.69 frames. ], batch size: 28, lr: 3.18e-03, grad_scale: 8.0 2023-05-18 23:18:00,768 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.708e+02 2.587e+02 3.015e+02 3.733e+02 1.019e+03, threshold=6.030e+02, percent-clipped=3.0 2023-05-18 23:18:09,541 INFO [finetune.py:992] (1/2) Epoch 19, batch 2300, loss[loss=0.18, simple_loss=0.2709, pruned_loss=0.04453, over 12189.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2518, pruned_loss=0.03549, over 2375255.25 frames. ], batch size: 35, lr: 3.18e-03, grad_scale: 16.0 2023-05-18 23:18:32,267 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-05-18 23:18:44,135 INFO [finetune.py:992] (1/2) Epoch 19, batch 2350, loss[loss=0.1674, simple_loss=0.2636, pruned_loss=0.03556, over 12372.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.2521, pruned_loss=0.03563, over 2377184.46 frames. ], batch size: 36, lr: 3.18e-03, grad_scale: 16.0 2023-05-18 23:19:03,604 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=322592.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:19:10,913 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.892e+02 2.735e+02 3.133e+02 3.848e+02 5.750e+02, threshold=6.265e+02, percent-clipped=0.0 2023-05-18 23:19:15,624 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2023-05-18 23:19:19,395 INFO [finetune.py:992] (1/2) Epoch 19, batch 2400, loss[loss=0.1764, simple_loss=0.2683, pruned_loss=0.04228, over 12089.00 frames. ], tot_loss[loss=0.162, simple_loss=0.2522, pruned_loss=0.03593, over 2370529.95 frames. ], batch size: 42, lr: 3.18e-03, grad_scale: 16.0 2023-05-18 23:19:37,418 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=322640.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:19:55,246 INFO [finetune.py:992] (1/2) Epoch 19, batch 2450, loss[loss=0.1634, simple_loss=0.2624, pruned_loss=0.03216, over 12031.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2526, pruned_loss=0.03605, over 2379523.36 frames. ], batch size: 31, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:20:21,561 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.795e+02 2.551e+02 2.922e+02 3.455e+02 4.706e+02, threshold=5.844e+02, percent-clipped=0.0 2023-05-18 23:20:29,731 INFO [finetune.py:992] (1/2) Epoch 19, batch 2500, loss[loss=0.1537, simple_loss=0.2431, pruned_loss=0.03214, over 12180.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.2521, pruned_loss=0.0357, over 2387953.41 frames. ], batch size: 31, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:21:04,195 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.26 vs. limit=2.0 2023-05-18 23:21:04,470 INFO [finetune.py:992] (1/2) Epoch 19, batch 2550, loss[loss=0.1777, simple_loss=0.2813, pruned_loss=0.03702, over 12356.00 frames. ], tot_loss[loss=0.1615, simple_loss=0.2519, pruned_loss=0.03555, over 2388026.96 frames. ], batch size: 35, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:21:31,738 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.889e+02 2.580e+02 3.047e+02 3.667e+02 9.272e+02, threshold=6.094e+02, percent-clipped=2.0 2023-05-18 23:21:40,565 INFO [finetune.py:992] (1/2) Epoch 19, batch 2600, loss[loss=0.1481, simple_loss=0.2352, pruned_loss=0.03055, over 12094.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.2524, pruned_loss=0.03565, over 2383743.90 frames. ], batch size: 32, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:22:14,664 INFO [finetune.py:992] (1/2) Epoch 19, batch 2650, loss[loss=0.1316, simple_loss=0.2137, pruned_loss=0.02471, over 12193.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.2534, pruned_loss=0.03609, over 2385664.42 frames. ], batch size: 29, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:22:35,498 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.42 vs. limit=2.0 2023-05-18 23:22:41,152 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.673e+02 2.569e+02 3.082e+02 3.739e+02 6.003e+02, threshold=6.165e+02, percent-clipped=1.0 2023-05-18 23:22:49,453 INFO [finetune.py:992] (1/2) Epoch 19, batch 2700, loss[loss=0.1427, simple_loss=0.2235, pruned_loss=0.031, over 11797.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2523, pruned_loss=0.03604, over 2383630.48 frames. ], batch size: 26, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:23:25,453 INFO [finetune.py:992] (1/2) Epoch 19, batch 2750, loss[loss=0.16, simple_loss=0.2401, pruned_loss=0.03989, over 11368.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.2517, pruned_loss=0.03585, over 2386911.78 frames. ], batch size: 25, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:23:33,453 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=2.07 vs. limit=2.0 2023-05-18 23:23:41,437 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=322987.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:23:46,699 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.39 vs. limit=2.0 2023-05-18 23:23:49,335 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.39 vs. limit=2.0 2023-05-18 23:23:51,988 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.607e+02 2.529e+02 2.986e+02 3.645e+02 7.663e+02, threshold=5.971e+02, percent-clipped=1.0 2023-05-18 23:24:00,410 INFO [finetune.py:992] (1/2) Epoch 19, batch 2800, loss[loss=0.1559, simple_loss=0.237, pruned_loss=0.03746, over 12204.00 frames. ], tot_loss[loss=0.1615, simple_loss=0.2514, pruned_loss=0.03577, over 2386149.35 frames. ], batch size: 29, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:24:00,589 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.2048, 2.2185, 3.6489, 4.2057, 3.7147, 4.1949, 3.6713, 2.9633], device='cuda:1'), covar=tensor([0.0057, 0.0534, 0.0148, 0.0058, 0.0137, 0.0096, 0.0182, 0.0412], device='cuda:1'), in_proj_covar=tensor([0.0092, 0.0124, 0.0105, 0.0083, 0.0106, 0.0119, 0.0105, 0.0141], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 23:24:22,004 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.3024, 5.1900, 5.2499, 5.3642, 4.9885, 5.0370, 4.7634, 5.2485], device='cuda:1'), covar=tensor([0.0809, 0.0581, 0.0859, 0.0504, 0.1733, 0.1288, 0.0539, 0.0979], device='cuda:1'), in_proj_covar=tensor([0.0576, 0.0745, 0.0653, 0.0660, 0.0892, 0.0775, 0.0593, 0.0516], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0003, 0.0003], device='cuda:1') 2023-05-18 23:24:24,019 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.8399, 3.2272, 2.4538, 2.2761, 2.9228, 2.3401, 3.1054, 2.6938], device='cuda:1'), covar=tensor([0.0577, 0.0654, 0.1046, 0.1403, 0.0315, 0.1210, 0.0503, 0.0824], device='cuda:1'), in_proj_covar=tensor([0.0189, 0.0261, 0.0178, 0.0202, 0.0144, 0.0187, 0.0200, 0.0177], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 23:24:24,035 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=323048.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:24:35,132 INFO [finetune.py:992] (1/2) Epoch 19, batch 2850, loss[loss=0.1773, simple_loss=0.2663, pruned_loss=0.04413, over 12114.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.2518, pruned_loss=0.0359, over 2381937.08 frames. ], batch size: 45, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:24:55,468 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([6.0913, 6.0967, 5.8693, 5.3444, 5.1426, 5.9697, 5.5989, 5.4104], device='cuda:1'), covar=tensor([0.0881, 0.0977, 0.0717, 0.1714, 0.0758, 0.0864, 0.1715, 0.1050], device='cuda:1'), in_proj_covar=tensor([0.0660, 0.0595, 0.0544, 0.0672, 0.0448, 0.0769, 0.0824, 0.0596], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0004, 0.0004, 0.0003], device='cuda:1') 2023-05-18 23:25:02,865 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.854e+02 2.503e+02 2.936e+02 3.359e+02 7.469e+02, threshold=5.872e+02, percent-clipped=3.0 2023-05-18 23:25:11,040 INFO [finetune.py:992] (1/2) Epoch 19, batch 2900, loss[loss=0.1724, simple_loss=0.2614, pruned_loss=0.04171, over 12144.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.2517, pruned_loss=0.03584, over 2382941.27 frames. ], batch size: 36, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:25:20,438 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.26 vs. limit=2.0 2023-05-18 23:25:42,721 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.5995, 2.7067, 3.7480, 4.6731, 3.9351, 4.6838, 3.8511, 3.3028], device='cuda:1'), covar=tensor([0.0049, 0.0418, 0.0157, 0.0045, 0.0160, 0.0071, 0.0172, 0.0400], device='cuda:1'), in_proj_covar=tensor([0.0092, 0.0125, 0.0106, 0.0083, 0.0106, 0.0120, 0.0106, 0.0142], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 23:25:46,003 INFO [finetune.py:992] (1/2) Epoch 19, batch 2950, loss[loss=0.1539, simple_loss=0.2427, pruned_loss=0.03256, over 12155.00 frames. ], tot_loss[loss=0.1615, simple_loss=0.2515, pruned_loss=0.03573, over 2383877.36 frames. ], batch size: 36, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:26:12,551 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.639e+02 2.607e+02 2.944e+02 3.596e+02 8.072e+02, threshold=5.887e+02, percent-clipped=1.0 2023-05-18 23:26:20,536 INFO [finetune.py:992] (1/2) Epoch 19, batch 3000, loss[loss=0.1451, simple_loss=0.2316, pruned_loss=0.02934, over 12292.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.2507, pruned_loss=0.03572, over 2373078.62 frames. ], batch size: 33, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:26:20,536 INFO [finetune.py:1017] (1/2) Computing validation loss 2023-05-18 23:26:32,121 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.0556, 3.0175, 4.6522, 2.4157, 2.5100, 3.6692, 2.9731, 3.6444], device='cuda:1'), covar=tensor([0.0654, 0.1494, 0.0286, 0.1453, 0.2202, 0.1429, 0.1670, 0.1190], device='cuda:1'), in_proj_covar=tensor([0.0246, 0.0245, 0.0271, 0.0191, 0.0245, 0.0304, 0.0233, 0.0279], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 23:26:38,295 INFO [finetune.py:1026] (1/2) Epoch 19, validation: loss=0.3167, simple_loss=0.3909, pruned_loss=0.1212, over 1020973.00 frames. 2023-05-18 23:26:38,295 INFO [finetune.py:1027] (1/2) Maximum memory allocated so far is 12411MB 2023-05-18 23:27:12,523 INFO [finetune.py:992] (1/2) Epoch 19, batch 3050, loss[loss=0.1554, simple_loss=0.2452, pruned_loss=0.03276, over 12253.00 frames. ], tot_loss[loss=0.1609, simple_loss=0.2505, pruned_loss=0.03565, over 2378926.62 frames. ], batch size: 32, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:27:18,363 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0 2023-05-18 23:27:38,838 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.734e+02 2.566e+02 3.043e+02 3.630e+02 7.706e+02, threshold=6.086e+02, percent-clipped=3.0 2023-05-18 23:27:47,223 INFO [finetune.py:992] (1/2) Epoch 19, batch 3100, loss[loss=0.1799, simple_loss=0.2722, pruned_loss=0.04383, over 10524.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.2511, pruned_loss=0.03571, over 2381420.12 frames. ], batch size: 69, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:28:07,431 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=323343.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:28:23,316 INFO [finetune.py:992] (1/2) Epoch 19, batch 3150, loss[loss=0.1536, simple_loss=0.2419, pruned_loss=0.03263, over 12348.00 frames. ], tot_loss[loss=0.1604, simple_loss=0.2504, pruned_loss=0.03524, over 2382279.53 frames. ], batch size: 31, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:28:27,008 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=323369.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:28:43,689 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.4015, 4.8581, 4.2555, 5.0091, 4.5598, 2.7042, 4.1265, 3.1680], device='cuda:1'), covar=tensor([0.0764, 0.0614, 0.1311, 0.0641, 0.1169, 0.1893, 0.1193, 0.3044], device='cuda:1'), in_proj_covar=tensor([0.0314, 0.0385, 0.0369, 0.0343, 0.0380, 0.0282, 0.0354, 0.0373], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 23:28:49,447 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.5106, 5.0552, 5.4702, 4.7892, 5.1361, 4.9342, 5.5202, 5.1017], device='cuda:1'), covar=tensor([0.0286, 0.0383, 0.0328, 0.0278, 0.0391, 0.0348, 0.0228, 0.0279], device='cuda:1'), in_proj_covar=tensor([0.0282, 0.0284, 0.0308, 0.0280, 0.0280, 0.0279, 0.0255, 0.0228], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 23:28:49,925 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.619e+02 2.536e+02 2.903e+02 3.548e+02 1.078e+03, threshold=5.805e+02, percent-clipped=2.0 2023-05-18 23:28:58,417 INFO [finetune.py:992] (1/2) Epoch 19, batch 3200, loss[loss=0.1551, simple_loss=0.2537, pruned_loss=0.02825, over 12149.00 frames. ], tot_loss[loss=0.16, simple_loss=0.2501, pruned_loss=0.03498, over 2375949.48 frames. ], batch size: 36, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:29:09,776 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=323430.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 23:29:28,509 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=323457.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:29:33,192 INFO [finetune.py:992] (1/2) Epoch 19, batch 3250, loss[loss=0.1442, simple_loss=0.2278, pruned_loss=0.03026, over 12343.00 frames. ], tot_loss[loss=0.1605, simple_loss=0.2506, pruned_loss=0.03523, over 2374792.73 frames. ], batch size: 30, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:30:00,426 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.701e+02 2.588e+02 2.965e+02 3.406e+02 6.970e+02, threshold=5.929e+02, percent-clipped=2.0 2023-05-18 23:30:09,265 INFO [finetune.py:992] (1/2) Epoch 19, batch 3300, loss[loss=0.174, simple_loss=0.2797, pruned_loss=0.03417, over 12177.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2514, pruned_loss=0.03565, over 2379136.20 frames. ], batch size: 35, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:30:12,177 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=323518.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:30:43,810 INFO [finetune.py:992] (1/2) Epoch 19, batch 3350, loss[loss=0.1538, simple_loss=0.2398, pruned_loss=0.0339, over 12123.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.251, pruned_loss=0.03558, over 2365956.64 frames. ], batch size: 30, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:31:10,327 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.920e+02 2.670e+02 3.086e+02 3.768e+02 7.521e+02, threshold=6.171e+02, percent-clipped=2.0 2023-05-18 23:31:18,787 INFO [finetune.py:992] (1/2) Epoch 19, batch 3400, loss[loss=0.1766, simple_loss=0.2697, pruned_loss=0.04174, over 11762.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.2517, pruned_loss=0.03579, over 2370437.20 frames. ], batch size: 44, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:31:25,189 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.15 vs. limit=5.0 2023-05-18 23:31:35,202 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.8985, 3.8649, 3.8567, 3.9693, 3.6285, 3.5760, 3.6373, 3.8323], device='cuda:1'), covar=tensor([0.1606, 0.1038, 0.1805, 0.1000, 0.2530, 0.2021, 0.0815, 0.1408], device='cuda:1'), in_proj_covar=tensor([0.0568, 0.0738, 0.0649, 0.0654, 0.0886, 0.0769, 0.0587, 0.0510], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0003, 0.0003], device='cuda:1') 2023-05-18 23:31:39,232 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=323643.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:31:54,283 INFO [finetune.py:992] (1/2) Epoch 19, batch 3450, loss[loss=0.1594, simple_loss=0.248, pruned_loss=0.03538, over 12286.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2522, pruned_loss=0.03618, over 2371596.16 frames. ], batch size: 33, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:31:54,463 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=323664.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:31:56,671 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.6036, 2.6804, 3.1887, 4.3616, 2.5273, 4.4043, 4.4550, 4.5396], device='cuda:1'), covar=tensor([0.0111, 0.1208, 0.0555, 0.0165, 0.1300, 0.0232, 0.0160, 0.0106], device='cuda:1'), in_proj_covar=tensor([0.0125, 0.0203, 0.0186, 0.0125, 0.0190, 0.0184, 0.0183, 0.0129], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 23:32:04,702 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.2988, 5.1249, 5.2314, 5.2893, 4.9047, 4.9645, 4.7080, 5.2031], device='cuda:1'), covar=tensor([0.0718, 0.0657, 0.0868, 0.0585, 0.2090, 0.1296, 0.0594, 0.1111], device='cuda:1'), in_proj_covar=tensor([0.0569, 0.0740, 0.0650, 0.0655, 0.0889, 0.0771, 0.0588, 0.0511], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0003, 0.0003], device='cuda:1') 2023-05-18 23:32:12,919 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=323691.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:32:21,395 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.825e+02 2.730e+02 3.210e+02 3.631e+02 5.460e+02, threshold=6.419e+02, percent-clipped=0.0 2023-05-18 23:32:29,805 INFO [finetune.py:992] (1/2) Epoch 19, batch 3500, loss[loss=0.1416, simple_loss=0.2277, pruned_loss=0.02776, over 12420.00 frames. ], tot_loss[loss=0.162, simple_loss=0.2519, pruned_loss=0.03603, over 2381023.79 frames. ], batch size: 32, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:32:36,189 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.5758, 3.3416, 4.9974, 2.4584, 2.7080, 3.6832, 3.1224, 3.7436], device='cuda:1'), covar=tensor([0.0515, 0.1250, 0.0429, 0.1401, 0.2132, 0.1778, 0.1597, 0.1293], device='cuda:1'), in_proj_covar=tensor([0.0247, 0.0247, 0.0273, 0.0193, 0.0247, 0.0306, 0.0234, 0.0281], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 23:32:37,338 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=323725.0, num_to_drop=1, layers_to_drop={3} 2023-05-18 23:32:37,441 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=323725.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:32:37,454 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=323725.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:32:46,375 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.0385, 5.9946, 5.5244, 5.5771, 6.0322, 5.3337, 5.5380, 5.5411], device='cuda:1'), covar=tensor([0.1491, 0.0854, 0.1191, 0.1603, 0.0935, 0.2133, 0.1789, 0.1109], device='cuda:1'), in_proj_covar=tensor([0.0370, 0.0520, 0.0419, 0.0464, 0.0482, 0.0463, 0.0415, 0.0401], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 23:32:56,811 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.5373, 3.0691, 3.8635, 2.2642, 2.6109, 3.0638, 2.8921, 3.1476], device='cuda:1'), covar=tensor([0.0579, 0.1095, 0.0455, 0.1419, 0.1920, 0.1672, 0.1365, 0.1331], device='cuda:1'), in_proj_covar=tensor([0.0247, 0.0247, 0.0273, 0.0193, 0.0247, 0.0306, 0.0234, 0.0281], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 23:33:04,765 INFO [finetune.py:992] (1/2) Epoch 19, batch 3550, loss[loss=0.1499, simple_loss=0.2357, pruned_loss=0.03206, over 12142.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2525, pruned_loss=0.03623, over 2368540.95 frames. ], batch size: 30, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:33:09,667 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=323770.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:33:14,548 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.1442, 4.5037, 4.1470, 4.9005, 4.4622, 2.8470, 4.2121, 2.9963], device='cuda:1'), covar=tensor([0.0955, 0.0955, 0.1495, 0.0608, 0.1231, 0.1863, 0.1112, 0.3628], device='cuda:1'), in_proj_covar=tensor([0.0315, 0.0387, 0.0370, 0.0346, 0.0381, 0.0282, 0.0354, 0.0374], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 23:33:20,778 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=323786.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:33:32,470 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.804e+02 2.617e+02 3.192e+02 3.994e+02 7.138e+02, threshold=6.385e+02, percent-clipped=1.0 2023-05-18 23:33:40,332 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=323813.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:33:40,904 INFO [finetune.py:992] (1/2) Epoch 19, batch 3600, loss[loss=0.1715, simple_loss=0.2517, pruned_loss=0.04567, over 8414.00 frames. ], tot_loss[loss=0.162, simple_loss=0.2521, pruned_loss=0.03596, over 2370882.46 frames. ], batch size: 98, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:33:52,782 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=323831.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:34:15,678 INFO [finetune.py:992] (1/2) Epoch 19, batch 3650, loss[loss=0.1782, simple_loss=0.2537, pruned_loss=0.05136, over 7719.00 frames. ], tot_loss[loss=0.161, simple_loss=0.251, pruned_loss=0.03555, over 2375350.45 frames. ], batch size: 98, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:34:20,668 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.7534, 3.0682, 4.7754, 5.0269, 2.9172, 2.6505, 3.0946, 2.3071], device='cuda:1'), covar=tensor([0.1785, 0.2910, 0.0462, 0.0406, 0.1437, 0.2796, 0.2876, 0.4287], device='cuda:1'), in_proj_covar=tensor([0.0314, 0.0397, 0.0284, 0.0311, 0.0283, 0.0327, 0.0410, 0.0386], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-18 23:34:41,547 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.859e+02 2.739e+02 3.134e+02 3.664e+02 6.861e+02, threshold=6.267e+02, percent-clipped=1.0 2023-05-18 23:34:49,968 INFO [finetune.py:992] (1/2) Epoch 19, batch 3700, loss[loss=0.1649, simple_loss=0.2657, pruned_loss=0.03201, over 10829.00 frames. ], tot_loss[loss=0.1608, simple_loss=0.2509, pruned_loss=0.03537, over 2379142.58 frames. ], batch size: 69, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:35:25,635 INFO [finetune.py:992] (1/2) Epoch 19, batch 3750, loss[loss=0.1512, simple_loss=0.2292, pruned_loss=0.03661, over 12193.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.2514, pruned_loss=0.03552, over 2379698.30 frames. ], batch size: 29, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:35:40,908 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=323986.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:35:46,403 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.1282, 4.9879, 4.8738, 4.9554, 4.6622, 5.0972, 5.0441, 5.2374], device='cuda:1'), covar=tensor([0.0218, 0.0161, 0.0204, 0.0363, 0.0772, 0.0323, 0.0171, 0.0175], device='cuda:1'), in_proj_covar=tensor([0.0210, 0.0210, 0.0203, 0.0262, 0.0252, 0.0233, 0.0190, 0.0246], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:1') 2023-05-18 23:35:49,209 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.4892, 5.0985, 5.4973, 4.7456, 5.1678, 4.8988, 5.5208, 5.0917], device='cuda:1'), covar=tensor([0.0255, 0.0368, 0.0250, 0.0280, 0.0384, 0.0326, 0.0181, 0.0280], device='cuda:1'), in_proj_covar=tensor([0.0282, 0.0284, 0.0308, 0.0280, 0.0279, 0.0278, 0.0254, 0.0227], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 23:35:55,030 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.754e+02 2.453e+02 2.878e+02 3.473e+02 6.728e+02, threshold=5.757e+02, percent-clipped=2.0 2023-05-18 23:36:03,527 INFO [finetune.py:992] (1/2) Epoch 19, batch 3800, loss[loss=0.1671, simple_loss=0.2596, pruned_loss=0.03729, over 12145.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.2514, pruned_loss=0.03535, over 2382289.67 frames. ], batch size: 36, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:36:07,775 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=324020.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:36:11,337 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=324025.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 23:36:25,215 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.7121, 2.7695, 3.7957, 4.8579, 4.0609, 4.6557, 4.0377, 3.5609], device='cuda:1'), covar=tensor([0.0042, 0.0415, 0.0125, 0.0033, 0.0127, 0.0086, 0.0125, 0.0321], device='cuda:1'), in_proj_covar=tensor([0.0093, 0.0126, 0.0108, 0.0083, 0.0108, 0.0121, 0.0106, 0.0143], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 23:36:26,563 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=324047.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:36:26,630 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.1200, 3.2139, 4.4770, 2.4209, 2.6628, 3.3672, 2.9170, 3.4968], device='cuda:1'), covar=tensor([0.0484, 0.1191, 0.0365, 0.1420, 0.1965, 0.1609, 0.1569, 0.1342], device='cuda:1'), in_proj_covar=tensor([0.0246, 0.0246, 0.0272, 0.0192, 0.0247, 0.0305, 0.0234, 0.0280], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 23:36:38,643 INFO [finetune.py:992] (1/2) Epoch 19, batch 3850, loss[loss=0.1627, simple_loss=0.2548, pruned_loss=0.0353, over 12144.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2515, pruned_loss=0.03559, over 2378641.05 frames. ], batch size: 36, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:36:45,076 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=324073.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:36:50,707 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=324081.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:37:05,942 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.656e+02 2.522e+02 3.008e+02 3.489e+02 6.361e+02, threshold=6.016e+02, percent-clipped=2.0 2023-05-18 23:37:13,978 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=324113.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:37:14,565 INFO [finetune.py:992] (1/2) Epoch 19, batch 3900, loss[loss=0.1355, simple_loss=0.2152, pruned_loss=0.02788, over 12011.00 frames. ], tot_loss[loss=0.1606, simple_loss=0.2507, pruned_loss=0.03527, over 2380877.78 frames. ], batch size: 28, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:37:22,796 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=324126.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:37:47,393 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=324161.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:37:49,252 INFO [finetune.py:992] (1/2) Epoch 19, batch 3950, loss[loss=0.1713, simple_loss=0.2478, pruned_loss=0.04745, over 12342.00 frames. ], tot_loss[loss=0.1607, simple_loss=0.2507, pruned_loss=0.03536, over 2371344.42 frames. ], batch size: 31, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:37:53,117 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.78 vs. limit=2.0 2023-05-18 23:38:05,254 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.5492, 5.3471, 5.4412, 5.5517, 5.1364, 5.2533, 5.0129, 5.4174], device='cuda:1'), covar=tensor([0.0658, 0.0588, 0.0830, 0.0498, 0.1759, 0.1255, 0.0497, 0.1099], device='cuda:1'), in_proj_covar=tensor([0.0577, 0.0751, 0.0660, 0.0660, 0.0896, 0.0784, 0.0595, 0.0515], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0003, 0.0003], device='cuda:1') 2023-05-18 23:38:16,538 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.704e+02 3.151e+02 3.838e+02 9.620e+02, threshold=6.301e+02, percent-clipped=2.0 2023-05-18 23:38:24,965 INFO [finetune.py:992] (1/2) Epoch 19, batch 4000, loss[loss=0.1563, simple_loss=0.2406, pruned_loss=0.03596, over 12092.00 frames. ], tot_loss[loss=0.1607, simple_loss=0.2503, pruned_loss=0.0355, over 2362645.82 frames. ], batch size: 32, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:38:25,829 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.4876, 5.1062, 5.5137, 4.7604, 5.1324, 4.8853, 5.5487, 5.1039], device='cuda:1'), covar=tensor([0.0281, 0.0345, 0.0246, 0.0272, 0.0422, 0.0349, 0.0190, 0.0279], device='cuda:1'), in_proj_covar=tensor([0.0282, 0.0283, 0.0308, 0.0279, 0.0279, 0.0279, 0.0255, 0.0228], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 23:38:59,861 INFO [finetune.py:992] (1/2) Epoch 19, batch 4050, loss[loss=0.1687, simple_loss=0.2669, pruned_loss=0.03525, over 12314.00 frames. ], tot_loss[loss=0.1613, simple_loss=0.2512, pruned_loss=0.03566, over 2371203.64 frames. ], batch size: 34, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:39:26,267 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.739e+02 3.058e+02 3.734e+02 6.379e+02, threshold=6.115e+02, percent-clipped=1.0 2023-05-18 23:39:34,531 INFO [finetune.py:992] (1/2) Epoch 19, batch 4100, loss[loss=0.1773, simple_loss=0.2764, pruned_loss=0.03904, over 11515.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.2514, pruned_loss=0.03589, over 2370871.61 frames. ], batch size: 48, lr: 3.17e-03, grad_scale: 16.0 2023-05-18 23:39:36,034 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=324316.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 23:39:38,615 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=324320.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:39:50,026 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2023-05-18 23:39:53,700 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=324342.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:40:09,357 INFO [finetune.py:992] (1/2) Epoch 19, batch 4150, loss[loss=0.1833, simple_loss=0.2745, pruned_loss=0.04601, over 12017.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.2517, pruned_loss=0.03587, over 2374368.13 frames. ], batch size: 40, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:40:12,288 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=324368.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:40:16,635 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.2357, 4.0089, 4.1861, 4.3008, 2.9791, 3.8436, 2.7766, 3.9049], device='cuda:1'), covar=tensor([0.1828, 0.0782, 0.0802, 0.0686, 0.1319, 0.0706, 0.1920, 0.1332], device='cuda:1'), in_proj_covar=tensor([0.0232, 0.0271, 0.0302, 0.0364, 0.0248, 0.0246, 0.0263, 0.0373], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-18 23:40:18,500 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=324377.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 23:40:21,783 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=324381.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:40:35,954 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.743e+02 2.684e+02 3.086e+02 3.627e+02 7.219e+02, threshold=6.172e+02, percent-clipped=3.0 2023-05-18 23:40:44,285 INFO [finetune.py:992] (1/2) Epoch 19, batch 4200, loss[loss=0.1673, simple_loss=0.258, pruned_loss=0.0383, over 11726.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2527, pruned_loss=0.03618, over 2373242.07 frames. ], batch size: 48, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:40:52,637 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=324426.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:40:52,779 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.0250, 3.5956, 5.3239, 2.8006, 2.9463, 3.7609, 3.3740, 3.8131], device='cuda:1'), covar=tensor([0.0412, 0.1114, 0.0341, 0.1195, 0.1983, 0.1676, 0.1307, 0.1453], device='cuda:1'), in_proj_covar=tensor([0.0243, 0.0242, 0.0269, 0.0190, 0.0243, 0.0302, 0.0231, 0.0277], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 23:40:54,652 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=324429.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:41:08,017 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.4148, 2.8225, 3.9342, 3.3352, 3.7374, 3.4358, 2.8527, 3.7773], device='cuda:1'), covar=tensor([0.0124, 0.0348, 0.0124, 0.0230, 0.0171, 0.0186, 0.0342, 0.0146], device='cuda:1'), in_proj_covar=tensor([0.0193, 0.0217, 0.0203, 0.0199, 0.0232, 0.0180, 0.0209, 0.0205], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 23:41:18,993 INFO [finetune.py:992] (1/2) Epoch 19, batch 4250, loss[loss=0.1372, simple_loss=0.2201, pruned_loss=0.0271, over 12020.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2521, pruned_loss=0.03601, over 2380715.72 frames. ], batch size: 28, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:41:26,087 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=324474.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:41:34,076 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.86 vs. limit=2.0 2023-05-18 23:41:45,973 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.545e+02 2.627e+02 3.023e+02 3.586e+02 8.523e+02, threshold=6.045e+02, percent-clipped=1.0 2023-05-18 23:41:54,193 INFO [finetune.py:992] (1/2) Epoch 19, batch 4300, loss[loss=0.1448, simple_loss=0.222, pruned_loss=0.03379, over 12329.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.253, pruned_loss=0.03634, over 2374637.67 frames. ], batch size: 31, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:41:58,495 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([6.1778, 6.0984, 5.8776, 5.4015, 5.3040, 6.0333, 5.6761, 5.4227], device='cuda:1'), covar=tensor([0.0758, 0.1166, 0.0792, 0.1954, 0.0736, 0.0777, 0.1675, 0.1176], device='cuda:1'), in_proj_covar=tensor([0.0653, 0.0591, 0.0539, 0.0669, 0.0445, 0.0763, 0.0816, 0.0592], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0004, 0.0002], device='cuda:1') 2023-05-18 23:42:00,001 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.5085, 2.7791, 3.7245, 4.6103, 3.8986, 4.5398, 3.9213, 3.1325], device='cuda:1'), covar=tensor([0.0050, 0.0376, 0.0153, 0.0044, 0.0132, 0.0086, 0.0153, 0.0386], device='cuda:1'), in_proj_covar=tensor([0.0093, 0.0125, 0.0108, 0.0084, 0.0108, 0.0121, 0.0106, 0.0143], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 23:42:02,169 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.9691, 4.8179, 4.6843, 4.8589, 4.4821, 4.9446, 4.9812, 5.0628], device='cuda:1'), covar=tensor([0.0358, 0.0176, 0.0225, 0.0313, 0.0855, 0.0316, 0.0206, 0.0199], device='cuda:1'), in_proj_covar=tensor([0.0210, 0.0211, 0.0203, 0.0261, 0.0253, 0.0234, 0.0190, 0.0246], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:1') 2023-05-18 23:42:13,507 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.42 vs. limit=2.0 2023-05-18 23:42:29,905 INFO [finetune.py:992] (1/2) Epoch 19, batch 4350, loss[loss=0.1701, simple_loss=0.2541, pruned_loss=0.04302, over 12119.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2526, pruned_loss=0.03658, over 2367317.58 frames. ], batch size: 30, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:42:56,575 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.719e+02 2.719e+02 3.276e+02 3.849e+02 6.262e+02, threshold=6.552e+02, percent-clipped=3.0 2023-05-18 23:43:04,801 INFO [finetune.py:992] (1/2) Epoch 19, batch 4400, loss[loss=0.1623, simple_loss=0.2624, pruned_loss=0.03112, over 12145.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2521, pruned_loss=0.0361, over 2375151.41 frames. ], batch size: 36, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:43:24,935 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=324642.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:43:40,182 INFO [finetune.py:992] (1/2) Epoch 19, batch 4450, loss[loss=0.1555, simple_loss=0.2404, pruned_loss=0.03532, over 12360.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2521, pruned_loss=0.036, over 2372813.11 frames. ], batch size: 31, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:43:43,129 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.1585, 4.5933, 4.0345, 4.9404, 4.4627, 2.8499, 4.0271, 2.9877], device='cuda:1'), covar=tensor([0.0881, 0.0802, 0.1449, 0.0443, 0.1141, 0.1899, 0.1286, 0.3518], device='cuda:1'), in_proj_covar=tensor([0.0319, 0.0389, 0.0373, 0.0349, 0.0384, 0.0286, 0.0357, 0.0378], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 23:43:45,692 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=324672.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 23:43:55,040 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.11 vs. limit=2.0 2023-05-18 23:43:58,687 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=324690.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:44:06,791 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.953e+02 2.672e+02 3.182e+02 3.780e+02 7.429e+02, threshold=6.365e+02, percent-clipped=1.0 2023-05-18 23:44:15,101 INFO [finetune.py:992] (1/2) Epoch 19, batch 4500, loss[loss=0.1577, simple_loss=0.2585, pruned_loss=0.02849, over 12029.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2527, pruned_loss=0.03603, over 2375735.13 frames. ], batch size: 40, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:44:20,801 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.0327, 4.8773, 5.0018, 5.0553, 4.6816, 4.7891, 4.5092, 4.9363], device='cuda:1'), covar=tensor([0.0751, 0.0761, 0.0879, 0.0594, 0.2009, 0.1275, 0.0565, 0.1160], device='cuda:1'), in_proj_covar=tensor([0.0571, 0.0749, 0.0652, 0.0658, 0.0888, 0.0774, 0.0589, 0.0509], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0004, 0.0003, 0.0003], device='cuda:1') 2023-05-18 23:44:32,132 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-05-18 23:44:32,825 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.47 vs. limit=2.0 2023-05-18 23:44:34,679 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.00 vs. limit=5.0 2023-05-18 23:44:49,599 INFO [finetune.py:992] (1/2) Epoch 19, batch 4550, loss[loss=0.1688, simple_loss=0.271, pruned_loss=0.03334, over 12113.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.2531, pruned_loss=0.03622, over 2368037.21 frames. ], batch size: 39, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:45:17,025 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.768e+02 2.742e+02 3.078e+02 3.771e+02 5.857e+02, threshold=6.156e+02, percent-clipped=0.0 2023-05-18 23:45:20,753 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.6717, 4.6352, 4.5584, 4.2098, 4.1700, 4.6273, 4.3144, 4.2058], device='cuda:1'), covar=tensor([0.1003, 0.1158, 0.0717, 0.1540, 0.2574, 0.0879, 0.1697, 0.1158], device='cuda:1'), in_proj_covar=tensor([0.0648, 0.0583, 0.0533, 0.0660, 0.0441, 0.0754, 0.0806, 0.0583], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0004, 0.0002], device='cuda:1') 2023-05-18 23:45:25,542 INFO [finetune.py:992] (1/2) Epoch 19, batch 4600, loss[loss=0.154, simple_loss=0.2446, pruned_loss=0.03172, over 12119.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2531, pruned_loss=0.03642, over 2347702.02 frames. ], batch size: 33, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:45:36,268 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.7055, 2.9865, 4.8326, 4.8449, 2.8010, 2.6634, 3.0225, 2.2575], device='cuda:1'), covar=tensor([0.1861, 0.3119, 0.0434, 0.0488, 0.1428, 0.2758, 0.3055, 0.4409], device='cuda:1'), in_proj_covar=tensor([0.0315, 0.0399, 0.0286, 0.0311, 0.0285, 0.0328, 0.0412, 0.0387], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-18 23:46:00,923 INFO [finetune.py:992] (1/2) Epoch 19, batch 4650, loss[loss=0.2056, simple_loss=0.2778, pruned_loss=0.06671, over 7654.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.2531, pruned_loss=0.03658, over 2344684.27 frames. ], batch size: 98, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:46:27,196 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.031e+02 2.603e+02 3.036e+02 3.552e+02 5.344e+02, threshold=6.072e+02, percent-clipped=0.0 2023-05-18 23:46:35,307 INFO [finetune.py:992] (1/2) Epoch 19, batch 4700, loss[loss=0.1414, simple_loss=0.2132, pruned_loss=0.03481, over 11987.00 frames. ], tot_loss[loss=0.1633, simple_loss=0.2529, pruned_loss=0.03685, over 2348228.27 frames. ], batch size: 28, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:47:10,879 INFO [finetune.py:992] (1/2) Epoch 19, batch 4750, loss[loss=0.1481, simple_loss=0.2433, pruned_loss=0.02642, over 12104.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.2521, pruned_loss=0.03673, over 2343136.31 frames. ], batch size: 33, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:47:16,715 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=324972.0, num_to_drop=1, layers_to_drop={0} 2023-05-18 23:47:38,229 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.723e+02 2.992e+02 3.345e+02 4.905e+02, threshold=5.983e+02, percent-clipped=0.0 2023-05-18 23:47:46,643 INFO [finetune.py:992] (1/2) Epoch 19, batch 4800, loss[loss=0.1629, simple_loss=0.2482, pruned_loss=0.03878, over 12182.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2528, pruned_loss=0.03664, over 2352366.73 frames. ], batch size: 31, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:47:50,761 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=325020.0, num_to_drop=1, layers_to_drop={1} 2023-05-18 23:48:21,035 INFO [finetune.py:992] (1/2) Epoch 19, batch 4850, loss[loss=0.1405, simple_loss=0.2219, pruned_loss=0.02962, over 11400.00 frames. ], tot_loss[loss=0.1636, simple_loss=0.2536, pruned_loss=0.03675, over 2359979.57 frames. ], batch size: 25, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:48:47,867 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.830e+02 2.618e+02 3.035e+02 3.810e+02 8.195e+02, threshold=6.070e+02, percent-clipped=4.0 2023-05-18 23:48:56,302 INFO [finetune.py:992] (1/2) Epoch 19, batch 4900, loss[loss=0.1672, simple_loss=0.259, pruned_loss=0.03776, over 12143.00 frames. ], tot_loss[loss=0.1633, simple_loss=0.2532, pruned_loss=0.03665, over 2358353.46 frames. ], batch size: 39, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:49:31,703 INFO [finetune.py:992] (1/2) Epoch 19, batch 4950, loss[loss=0.1781, simple_loss=0.2682, pruned_loss=0.04397, over 12349.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2525, pruned_loss=0.03639, over 2361611.70 frames. ], batch size: 36, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:49:39,568 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.4387, 2.7000, 3.6773, 4.4704, 3.7732, 4.3544, 3.7752, 2.9723], device='cuda:1'), covar=tensor([0.0049, 0.0396, 0.0144, 0.0053, 0.0163, 0.0103, 0.0145, 0.0457], device='cuda:1'), in_proj_covar=tensor([0.0093, 0.0126, 0.0107, 0.0084, 0.0108, 0.0121, 0.0106, 0.0143], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 23:49:53,062 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.3655, 2.9678, 3.9995, 3.2670, 3.7532, 3.4152, 2.8432, 3.8434], device='cuda:1'), covar=tensor([0.0135, 0.0334, 0.0140, 0.0250, 0.0156, 0.0176, 0.0364, 0.0137], device='cuda:1'), in_proj_covar=tensor([0.0192, 0.0216, 0.0202, 0.0196, 0.0231, 0.0177, 0.0206, 0.0204], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 23:49:58,744 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.911e+02 2.620e+02 3.044e+02 3.772e+02 8.318e+02, threshold=6.087e+02, percent-clipped=2.0 2023-05-18 23:50:07,643 INFO [finetune.py:992] (1/2) Epoch 19, batch 5000, loss[loss=0.1312, simple_loss=0.218, pruned_loss=0.02221, over 12184.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2522, pruned_loss=0.0362, over 2363926.11 frames. ], batch size: 29, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:50:43,201 INFO [finetune.py:992] (1/2) Epoch 19, batch 5050, loss[loss=0.1478, simple_loss=0.2349, pruned_loss=0.03039, over 12341.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.2519, pruned_loss=0.03584, over 2372954.65 frames. ], batch size: 30, lr: 3.16e-03, grad_scale: 32.0 2023-05-18 23:51:07,937 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=325300.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:51:09,726 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.855e+02 2.639e+02 3.116e+02 3.589e+02 7.174e+02, threshold=6.233e+02, percent-clipped=1.0 2023-05-18 23:51:17,446 INFO [finetune.py:992] (1/2) Epoch 19, batch 5100, loss[loss=0.1662, simple_loss=0.2601, pruned_loss=0.03613, over 12058.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2514, pruned_loss=0.03566, over 2377511.53 frames. ], batch size: 42, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:51:21,070 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.5902, 5.4086, 5.5418, 5.5815, 5.2056, 5.2562, 5.0240, 5.4571], device='cuda:1'), covar=tensor([0.0657, 0.0606, 0.0815, 0.0548, 0.1766, 0.1286, 0.0513, 0.1072], device='cuda:1'), in_proj_covar=tensor([0.0573, 0.0751, 0.0660, 0.0665, 0.0892, 0.0780, 0.0592, 0.0511], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0003, 0.0003], device='cuda:1') 2023-05-18 23:51:29,277 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.3168, 4.7394, 2.9992, 2.7491, 4.0856, 2.7364, 3.9658, 3.3109], device='cuda:1'), covar=tensor([0.0720, 0.0505, 0.1208, 0.1619, 0.0276, 0.1340, 0.0587, 0.0907], device='cuda:1'), in_proj_covar=tensor([0.0193, 0.0268, 0.0182, 0.0207, 0.0146, 0.0190, 0.0206, 0.0181], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 23:51:50,568 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=325361.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:51:50,599 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.4154, 3.4273, 3.5126, 4.0003, 2.9991, 3.4953, 2.5672, 3.5117], device='cuda:1'), covar=tensor([0.1397, 0.0835, 0.1142, 0.0840, 0.1018, 0.0655, 0.1652, 0.0888], device='cuda:1'), in_proj_covar=tensor([0.0228, 0.0268, 0.0297, 0.0358, 0.0245, 0.0243, 0.0261, 0.0367], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-18 23:51:52,460 INFO [finetune.py:992] (1/2) Epoch 19, batch 5150, loss[loss=0.1687, simple_loss=0.2624, pruned_loss=0.03747, over 12109.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.251, pruned_loss=0.03558, over 2373012.16 frames. ], batch size: 38, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:51:57,688 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2023-05-18 23:52:13,344 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=325394.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:52:19,650 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.675e+02 2.499e+02 2.990e+02 3.607e+02 7.752e+02, threshold=5.980e+02, percent-clipped=2.0 2023-05-18 23:52:28,034 INFO [finetune.py:992] (1/2) Epoch 19, batch 5200, loss[loss=0.15, simple_loss=0.2371, pruned_loss=0.03146, over 12348.00 frames. ], tot_loss[loss=0.1605, simple_loss=0.2502, pruned_loss=0.03537, over 2374802.28 frames. ], batch size: 31, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:52:49,385 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.99 vs. limit=5.0 2023-05-18 23:52:56,466 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=325455.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:53:02,574 INFO [finetune.py:992] (1/2) Epoch 19, batch 5250, loss[loss=0.1604, simple_loss=0.2471, pruned_loss=0.03682, over 12349.00 frames. ], tot_loss[loss=0.1605, simple_loss=0.2506, pruned_loss=0.03522, over 2366649.67 frames. ], batch size: 36, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:53:15,390 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.3717, 4.6700, 2.9919, 2.7476, 4.0735, 2.9235, 3.9987, 3.1777], device='cuda:1'), covar=tensor([0.0717, 0.0661, 0.1093, 0.1525, 0.0296, 0.1189, 0.0487, 0.0902], device='cuda:1'), in_proj_covar=tensor([0.0192, 0.0265, 0.0181, 0.0206, 0.0146, 0.0188, 0.0204, 0.0180], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 23:53:18,258 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.4810, 2.4924, 3.1817, 4.2484, 2.4531, 4.3347, 4.4760, 4.4971], device='cuda:1'), covar=tensor([0.0154, 0.1393, 0.0535, 0.0212, 0.1354, 0.0253, 0.0154, 0.0128], device='cuda:1'), in_proj_covar=tensor([0.0126, 0.0206, 0.0186, 0.0126, 0.0191, 0.0185, 0.0184, 0.0130], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 23:53:30,094 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.785e+02 2.506e+02 2.926e+02 3.490e+02 1.307e+03, threshold=5.851e+02, percent-clipped=3.0 2023-05-18 23:53:38,296 INFO [finetune.py:992] (1/2) Epoch 19, batch 5300, loss[loss=0.1506, simple_loss=0.228, pruned_loss=0.03661, over 12001.00 frames. ], tot_loss[loss=0.1603, simple_loss=0.2504, pruned_loss=0.03511, over 2372600.34 frames. ], batch size: 28, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:54:13,386 INFO [finetune.py:992] (1/2) Epoch 19, batch 5350, loss[loss=0.1755, simple_loss=0.275, pruned_loss=0.03795, over 12121.00 frames. ], tot_loss[loss=0.1604, simple_loss=0.2505, pruned_loss=0.03516, over 2376855.11 frames. ], batch size: 38, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:54:15,636 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=325567.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:54:21,949 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.2085, 5.0757, 4.9672, 5.0570, 4.7533, 5.2104, 5.1296, 5.3681], device='cuda:1'), covar=tensor([0.0239, 0.0158, 0.0210, 0.0339, 0.0700, 0.0367, 0.0159, 0.0188], device='cuda:1'), in_proj_covar=tensor([0.0208, 0.0209, 0.0202, 0.0260, 0.0250, 0.0233, 0.0188, 0.0244], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:1') 2023-05-18 23:54:40,801 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.848e+02 2.587e+02 3.045e+02 3.701e+02 8.855e+02, threshold=6.090e+02, percent-clipped=2.0 2023-05-18 23:54:48,510 INFO [finetune.py:992] (1/2) Epoch 19, batch 5400, loss[loss=0.1442, simple_loss=0.2356, pruned_loss=0.02637, over 12086.00 frames. ], tot_loss[loss=0.161, simple_loss=0.2512, pruned_loss=0.0354, over 2375098.53 frames. ], batch size: 32, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:54:58,221 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=325628.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:55:02,357 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.0369, 4.5304, 3.9696, 4.8471, 4.3313, 2.6983, 4.1073, 2.9018], device='cuda:1'), covar=tensor([0.0927, 0.0814, 0.1562, 0.0570, 0.1298, 0.1984, 0.1081, 0.3595], device='cuda:1'), in_proj_covar=tensor([0.0318, 0.0384, 0.0369, 0.0347, 0.0381, 0.0282, 0.0353, 0.0374], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 23:55:17,945 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=325656.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:55:20,193 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=325659.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:55:23,565 INFO [finetune.py:992] (1/2) Epoch 19, batch 5450, loss[loss=0.1565, simple_loss=0.2452, pruned_loss=0.03387, over 12365.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.2519, pruned_loss=0.03567, over 2381141.27 frames. ], batch size: 31, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:55:51,582 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.792e+02 2.666e+02 3.067e+02 3.561e+02 6.039e+02, threshold=6.133e+02, percent-clipped=0.0 2023-05-18 23:55:59,207 INFO [finetune.py:992] (1/2) Epoch 19, batch 5500, loss[loss=0.1506, simple_loss=0.2295, pruned_loss=0.03587, over 12348.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.2513, pruned_loss=0.03542, over 2378467.75 frames. ], batch size: 30, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:56:03,489 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=325720.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:56:16,877 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.2335, 2.7149, 3.7887, 3.1961, 3.5389, 3.2439, 2.7893, 3.6294], device='cuda:1'), covar=tensor([0.0169, 0.0388, 0.0165, 0.0260, 0.0186, 0.0203, 0.0372, 0.0174], device='cuda:1'), in_proj_covar=tensor([0.0196, 0.0220, 0.0206, 0.0200, 0.0235, 0.0180, 0.0210, 0.0206], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 23:56:24,504 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=325750.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:56:34,087 INFO [finetune.py:992] (1/2) Epoch 19, batch 5550, loss[loss=0.1648, simple_loss=0.2596, pruned_loss=0.03497, over 12354.00 frames. ], tot_loss[loss=0.161, simple_loss=0.2514, pruned_loss=0.03534, over 2382876.41 frames. ], batch size: 36, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:56:52,512 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.16 vs. limit=2.0 2023-05-18 23:57:02,004 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.736e+02 2.494e+02 2.975e+02 3.710e+02 8.699e+02, threshold=5.950e+02, percent-clipped=3.0 2023-05-18 23:57:09,532 INFO [finetune.py:992] (1/2) Epoch 19, batch 5600, loss[loss=0.1644, simple_loss=0.2433, pruned_loss=0.04274, over 12205.00 frames. ], tot_loss[loss=0.1608, simple_loss=0.2511, pruned_loss=0.03522, over 2386400.52 frames. ], batch size: 29, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:57:44,519 INFO [finetune.py:992] (1/2) Epoch 19, batch 5650, loss[loss=0.1774, simple_loss=0.2834, pruned_loss=0.03566, over 11250.00 frames. ], tot_loss[loss=0.1608, simple_loss=0.2513, pruned_loss=0.03517, over 2384500.40 frames. ], batch size: 55, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:57:45,773 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.39 vs. limit=2.0 2023-05-18 23:58:04,039 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.32 vs. limit=2.0 2023-05-18 23:58:05,806 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=325894.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:58:11,789 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.930e+02 2.504e+02 3.020e+02 3.545e+02 6.966e+02, threshold=6.040e+02, percent-clipped=2.0 2023-05-18 23:58:15,091 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.40 vs. limit=2.0 2023-05-18 23:58:19,620 INFO [finetune.py:992] (1/2) Epoch 19, batch 5700, loss[loss=0.1754, simple_loss=0.2599, pruned_loss=0.04545, over 12013.00 frames. ], tot_loss[loss=0.1605, simple_loss=0.2513, pruned_loss=0.03481, over 2392745.80 frames. ], batch size: 42, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:58:20,572 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.3552, 2.9966, 3.7348, 2.2657, 2.5359, 2.9933, 2.9095, 3.0605], device='cuda:1'), covar=tensor([0.0606, 0.1129, 0.0443, 0.1375, 0.1922, 0.1648, 0.1288, 0.1326], device='cuda:1'), in_proj_covar=tensor([0.0245, 0.0244, 0.0270, 0.0191, 0.0243, 0.0301, 0.0231, 0.0278], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-18 23:58:25,897 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=325923.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:58:49,066 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=325955.0, num_to_drop=1, layers_to_drop={3} 2023-05-18 23:58:49,675 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=325956.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:58:55,285 INFO [finetune.py:992] (1/2) Epoch 19, batch 5750, loss[loss=0.1462, simple_loss=0.2418, pruned_loss=0.02532, over 12269.00 frames. ], tot_loss[loss=0.1601, simple_loss=0.2506, pruned_loss=0.03483, over 2388444.55 frames. ], batch size: 32, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:59:25,910 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.978e+02 2.547e+02 2.942e+02 3.608e+02 6.203e+02, threshold=5.885e+02, percent-clipped=1.0 2023-05-18 23:59:26,681 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=326004.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:59:27,129 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.83 vs. limit=2.0 2023-05-18 23:59:33,644 INFO [finetune.py:992] (1/2) Epoch 19, batch 5800, loss[loss=0.1473, simple_loss=0.2324, pruned_loss=0.03113, over 12268.00 frames. ], tot_loss[loss=0.1608, simple_loss=0.2509, pruned_loss=0.03532, over 2375993.62 frames. ], batch size: 32, lr: 3.16e-03, grad_scale: 16.0 2023-05-18 23:59:34,419 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=326015.0, num_to_drop=0, layers_to_drop=set() 2023-05-18 23:59:45,553 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.1949, 2.6822, 3.7780, 3.1482, 3.7577, 3.2576, 2.6307, 3.7012], device='cuda:1'), covar=tensor([0.0201, 0.0467, 0.0223, 0.0302, 0.0170, 0.0227, 0.0459, 0.0168], device='cuda:1'), in_proj_covar=tensor([0.0196, 0.0221, 0.0207, 0.0201, 0.0237, 0.0181, 0.0211, 0.0207], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 23:59:47,726 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.3584, 4.6996, 2.8769, 2.7967, 4.0747, 2.9097, 3.9249, 3.4568], device='cuda:1'), covar=tensor([0.0690, 0.0520, 0.1220, 0.1439, 0.0310, 0.1222, 0.0557, 0.0708], device='cuda:1'), in_proj_covar=tensor([0.0193, 0.0268, 0.0183, 0.0207, 0.0147, 0.0190, 0.0207, 0.0181], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-18 23:59:56,941 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.2501, 4.7879, 4.1782, 5.0802, 4.5358, 2.9786, 4.2537, 3.2077], device='cuda:1'), covar=tensor([0.0896, 0.0708, 0.1482, 0.0599, 0.1263, 0.1829, 0.1162, 0.3155], device='cuda:1'), in_proj_covar=tensor([0.0314, 0.0381, 0.0365, 0.0344, 0.0377, 0.0280, 0.0349, 0.0371], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-18 23:59:58,849 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=326050.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:00:05,164 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.6614, 2.6578, 3.7570, 4.6574, 3.9068, 4.5130, 3.9668, 3.5991], device='cuda:1'), covar=tensor([0.0033, 0.0426, 0.0158, 0.0051, 0.0127, 0.0094, 0.0149, 0.0302], device='cuda:1'), in_proj_covar=tensor([0.0093, 0.0125, 0.0108, 0.0084, 0.0107, 0.0120, 0.0105, 0.0141], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 00:00:08,471 INFO [finetune.py:992] (1/2) Epoch 19, batch 5850, loss[loss=0.179, simple_loss=0.2725, pruned_loss=0.04273, over 11398.00 frames. ], tot_loss[loss=0.1607, simple_loss=0.2509, pruned_loss=0.03521, over 2379770.27 frames. ], batch size: 55, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:00:13,391 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.3878, 4.9013, 3.0595, 2.7957, 4.2921, 2.9641, 4.0865, 3.5658], device='cuda:1'), covar=tensor([0.0791, 0.0588, 0.1238, 0.1653, 0.0293, 0.1258, 0.0609, 0.0777], device='cuda:1'), in_proj_covar=tensor([0.0193, 0.0268, 0.0183, 0.0208, 0.0147, 0.0190, 0.0207, 0.0181], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 00:00:18,283 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.7530, 3.4124, 5.1492, 2.8242, 3.0016, 4.0377, 3.1462, 3.8739], device='cuda:1'), covar=tensor([0.0472, 0.1197, 0.0337, 0.1098, 0.1859, 0.1410, 0.1481, 0.1288], device='cuda:1'), in_proj_covar=tensor([0.0244, 0.0244, 0.0270, 0.0191, 0.0242, 0.0300, 0.0231, 0.0277], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 00:00:32,127 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=326098.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:00:35,481 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.771e+02 2.711e+02 3.114e+02 3.588e+02 1.446e+03, threshold=6.229e+02, percent-clipped=3.0 2023-05-19 00:00:43,247 INFO [finetune.py:992] (1/2) Epoch 19, batch 5900, loss[loss=0.1482, simple_loss=0.2376, pruned_loss=0.02943, over 12109.00 frames. ], tot_loss[loss=0.1615, simple_loss=0.2514, pruned_loss=0.03582, over 2365847.27 frames. ], batch size: 33, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:00:48,149 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.2561, 6.0988, 5.7275, 5.5566, 6.1916, 5.4386, 5.6080, 5.6234], device='cuda:1'), covar=tensor([0.1483, 0.0977, 0.1104, 0.2118, 0.1024, 0.2303, 0.1967, 0.1241], device='cuda:1'), in_proj_covar=tensor([0.0366, 0.0518, 0.0418, 0.0463, 0.0478, 0.0457, 0.0414, 0.0401], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 00:00:50,418 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.5312, 3.5556, 3.2073, 2.9650, 2.7755, 2.6929, 3.5051, 2.4434], device='cuda:1'), covar=tensor([0.0439, 0.0174, 0.0232, 0.0299, 0.0504, 0.0433, 0.0164, 0.0529], device='cuda:1'), in_proj_covar=tensor([0.0203, 0.0174, 0.0176, 0.0202, 0.0210, 0.0209, 0.0184, 0.0215], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 00:01:02,848 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.7763, 2.3447, 2.9446, 3.6397, 2.1853, 3.7600, 3.6508, 3.8574], device='cuda:1'), covar=tensor([0.0179, 0.1306, 0.0536, 0.0180, 0.1364, 0.0290, 0.0313, 0.0145], device='cuda:1'), in_proj_covar=tensor([0.0128, 0.0210, 0.0188, 0.0127, 0.0193, 0.0187, 0.0187, 0.0132], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:1') 2023-05-19 00:01:09,970 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.1626, 2.1364, 2.9799, 2.9165, 3.0401, 3.1533, 2.9794, 2.4505], device='cuda:1'), covar=tensor([0.0105, 0.0475, 0.0214, 0.0146, 0.0166, 0.0129, 0.0161, 0.0432], device='cuda:1'), in_proj_covar=tensor([0.0092, 0.0125, 0.0107, 0.0084, 0.0106, 0.0119, 0.0104, 0.0141], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 00:01:18,820 INFO [finetune.py:992] (1/2) Epoch 19, batch 5950, loss[loss=0.1726, simple_loss=0.2603, pruned_loss=0.04245, over 12297.00 frames. ], tot_loss[loss=0.162, simple_loss=0.252, pruned_loss=0.03601, over 2370244.56 frames. ], batch size: 34, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:01:22,556 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.17 vs. limit=2.0 2023-05-19 00:01:46,418 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.738e+02 2.527e+02 2.904e+02 3.247e+02 5.520e+02, threshold=5.807e+02, percent-clipped=0.0 2023-05-19 00:01:53,989 INFO [finetune.py:992] (1/2) Epoch 19, batch 6000, loss[loss=0.16, simple_loss=0.2387, pruned_loss=0.04065, over 12325.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.2519, pruned_loss=0.03574, over 2380287.94 frames. ], batch size: 30, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:01:53,989 INFO [finetune.py:1017] (1/2) Computing validation loss 2023-05-19 00:02:00,682 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.4204, 2.0666, 3.4075, 3.4766, 3.5727, 3.6388, 3.5481, 2.5093], device='cuda:1'), covar=tensor([0.0108, 0.0541, 0.0173, 0.0129, 0.0119, 0.0120, 0.0138, 0.0549], device='cuda:1'), in_proj_covar=tensor([0.0093, 0.0125, 0.0108, 0.0084, 0.0107, 0.0120, 0.0104, 0.0141], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 00:02:11,813 INFO [finetune.py:1026] (1/2) Epoch 19, validation: loss=0.3044, simple_loss=0.3846, pruned_loss=0.1121, over 1020973.00 frames. 2023-05-19 00:02:11,813 INFO [finetune.py:1027] (1/2) Maximum memory allocated so far is 12411MB 2023-05-19 00:02:18,397 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=326223.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:02:21,154 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=326227.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:02:24,589 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.0546, 4.6766, 4.0252, 4.9282, 4.3956, 2.4795, 4.0293, 2.9649], device='cuda:1'), covar=tensor([0.0872, 0.0608, 0.1360, 0.0529, 0.1079, 0.2088, 0.1150, 0.3248], device='cuda:1'), in_proj_covar=tensor([0.0317, 0.0384, 0.0369, 0.0347, 0.0380, 0.0283, 0.0352, 0.0375], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 00:02:37,428 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=326250.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 00:02:47,165 INFO [finetune.py:992] (1/2) Epoch 19, batch 6050, loss[loss=0.1818, simple_loss=0.2735, pruned_loss=0.04503, over 11300.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2524, pruned_loss=0.0364, over 2373561.13 frames. ], batch size: 55, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:02:48,842 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.7195, 2.8263, 3.4660, 4.4200, 2.6024, 4.4556, 4.6345, 4.7348], device='cuda:1'), covar=tensor([0.0130, 0.1313, 0.0481, 0.0194, 0.1394, 0.0299, 0.0174, 0.0120], device='cuda:1'), in_proj_covar=tensor([0.0128, 0.0211, 0.0189, 0.0128, 0.0193, 0.0188, 0.0188, 0.0132], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0002], device='cuda:1') 2023-05-19 00:02:52,190 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=326271.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:03:04,125 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=326288.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:03:14,240 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.994e+02 2.713e+02 3.347e+02 3.935e+02 9.908e+02, threshold=6.694e+02, percent-clipped=3.0 2023-05-19 00:03:16,595 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.5448, 2.5951, 3.7096, 4.4754, 3.7672, 4.4697, 3.8915, 3.3833], device='cuda:1'), covar=tensor([0.0044, 0.0442, 0.0162, 0.0065, 0.0161, 0.0088, 0.0143, 0.0364], device='cuda:1'), in_proj_covar=tensor([0.0093, 0.0125, 0.0108, 0.0084, 0.0107, 0.0121, 0.0105, 0.0142], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 00:03:22,135 INFO [finetune.py:992] (1/2) Epoch 19, batch 6100, loss[loss=0.1615, simple_loss=0.2585, pruned_loss=0.03222, over 12352.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2528, pruned_loss=0.03657, over 2365050.29 frames. ], batch size: 35, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:03:22,352 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.2086, 4.6782, 2.8669, 2.5895, 4.0057, 2.4852, 4.0008, 3.2728], device='cuda:1'), covar=tensor([0.0815, 0.0536, 0.1179, 0.1647, 0.0303, 0.1504, 0.0508, 0.0842], device='cuda:1'), in_proj_covar=tensor([0.0192, 0.0265, 0.0181, 0.0205, 0.0146, 0.0188, 0.0204, 0.0179], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 00:03:23,019 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=326315.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:03:56,987 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=326363.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:03:57,612 INFO [finetune.py:992] (1/2) Epoch 19, batch 6150, loss[loss=0.1699, simple_loss=0.2638, pruned_loss=0.03799, over 12046.00 frames. ], tot_loss[loss=0.1633, simple_loss=0.2535, pruned_loss=0.0366, over 2364236.67 frames. ], batch size: 37, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:04:02,804 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.0955, 4.5663, 3.9564, 4.8013, 4.2573, 2.7586, 4.1432, 2.8256], device='cuda:1'), covar=tensor([0.0894, 0.0763, 0.1502, 0.0562, 0.1279, 0.1883, 0.1033, 0.3659], device='cuda:1'), in_proj_covar=tensor([0.0318, 0.0385, 0.0369, 0.0347, 0.0380, 0.0283, 0.0352, 0.0376], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 00:04:25,541 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.795e+02 2.626e+02 3.138e+02 3.752e+02 1.193e+03, threshold=6.276e+02, percent-clipped=1.0 2023-05-19 00:04:33,140 INFO [finetune.py:992] (1/2) Epoch 19, batch 6200, loss[loss=0.1748, simple_loss=0.2621, pruned_loss=0.04369, over 12118.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.2535, pruned_loss=0.03667, over 2374528.86 frames. ], batch size: 39, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:05:07,522 INFO [finetune.py:992] (1/2) Epoch 19, batch 6250, loss[loss=0.1809, simple_loss=0.2778, pruned_loss=0.04202, over 12353.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.2536, pruned_loss=0.03656, over 2376610.23 frames. ], batch size: 35, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:05:34,824 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.776e+02 2.675e+02 3.318e+02 4.019e+02 6.717e+02, threshold=6.635e+02, percent-clipped=1.0 2023-05-19 00:05:42,286 INFO [finetune.py:992] (1/2) Epoch 19, batch 6300, loss[loss=0.1594, simple_loss=0.2439, pruned_loss=0.03749, over 12262.00 frames. ], tot_loss[loss=0.1639, simple_loss=0.2542, pruned_loss=0.03682, over 2376398.29 frames. ], batch size: 32, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:05:59,608 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.7944, 3.0183, 4.6712, 4.8539, 3.0209, 2.7222, 2.9351, 2.2801], device='cuda:1'), covar=tensor([0.1694, 0.2915, 0.0480, 0.0445, 0.1254, 0.2449, 0.3114, 0.4240], device='cuda:1'), in_proj_covar=tensor([0.0317, 0.0402, 0.0288, 0.0312, 0.0286, 0.0330, 0.0414, 0.0390], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-19 00:06:07,879 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=326550.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:06:17,563 INFO [finetune.py:992] (1/2) Epoch 19, batch 6350, loss[loss=0.1346, simple_loss=0.2213, pruned_loss=0.02398, over 12167.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.2529, pruned_loss=0.03676, over 2374082.75 frames. ], batch size: 29, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:06:30,786 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=326583.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:06:40,975 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=326598.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:06:44,715 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.703e+02 2.505e+02 2.938e+02 3.589e+02 9.004e+02, threshold=5.876e+02, percent-clipped=2.0 2023-05-19 00:06:52,545 INFO [finetune.py:992] (1/2) Epoch 19, batch 6400, loss[loss=0.1791, simple_loss=0.2802, pruned_loss=0.03901, over 10482.00 frames. ], tot_loss[loss=0.1636, simple_loss=0.2532, pruned_loss=0.03703, over 2359902.83 frames. ], batch size: 68, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:07:05,913 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.9343, 5.8172, 5.3696, 5.2982, 5.9430, 5.0715, 5.4275, 5.3450], device='cuda:1'), covar=tensor([0.1473, 0.0945, 0.1195, 0.1865, 0.0938, 0.2391, 0.1970, 0.1081], device='cuda:1'), in_proj_covar=tensor([0.0369, 0.0520, 0.0420, 0.0464, 0.0481, 0.0462, 0.0416, 0.0402], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 00:07:21,164 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.4766, 4.9699, 3.1967, 2.9314, 4.2084, 3.0242, 4.2305, 3.6104], device='cuda:1'), covar=tensor([0.0731, 0.0514, 0.1043, 0.1397, 0.0330, 0.1210, 0.0476, 0.0763], device='cuda:1'), in_proj_covar=tensor([0.0193, 0.0266, 0.0183, 0.0207, 0.0148, 0.0189, 0.0206, 0.0179], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 00:07:28,626 INFO [finetune.py:992] (1/2) Epoch 19, batch 6450, loss[loss=0.1616, simple_loss=0.2583, pruned_loss=0.03247, over 12344.00 frames. ], tot_loss[loss=0.1639, simple_loss=0.2536, pruned_loss=0.03709, over 2365549.95 frames. ], batch size: 36, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:07:55,427 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.654e+02 2.775e+02 3.187e+02 3.833e+02 9.993e+02, threshold=6.373e+02, percent-clipped=7.0 2023-05-19 00:07:58,348 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.9323, 5.8125, 5.3474, 5.2948, 5.9339, 5.1462, 5.3640, 5.2680], device='cuda:1'), covar=tensor([0.1561, 0.1038, 0.1224, 0.1893, 0.1008, 0.2300, 0.2115, 0.1179], device='cuda:1'), in_proj_covar=tensor([0.0366, 0.0517, 0.0418, 0.0461, 0.0478, 0.0459, 0.0414, 0.0400], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 00:08:03,160 INFO [finetune.py:992] (1/2) Epoch 19, batch 6500, loss[loss=0.1588, simple_loss=0.2475, pruned_loss=0.03504, over 12154.00 frames. ], tot_loss[loss=0.1636, simple_loss=0.2537, pruned_loss=0.03669, over 2370988.45 frames. ], batch size: 34, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:08:37,507 INFO [finetune.py:992] (1/2) Epoch 19, batch 6550, loss[loss=0.1298, simple_loss=0.2159, pruned_loss=0.02186, over 11769.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.2534, pruned_loss=0.03654, over 2370161.21 frames. ], batch size: 26, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:09:05,148 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.832e+02 2.594e+02 3.171e+02 3.875e+02 1.476e+03, threshold=6.343e+02, percent-clipped=4.0 2023-05-19 00:09:13,164 INFO [finetune.py:992] (1/2) Epoch 19, batch 6600, loss[loss=0.168, simple_loss=0.2664, pruned_loss=0.03479, over 11837.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2532, pruned_loss=0.03629, over 2378951.58 frames. ], batch size: 44, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:09:15,524 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.2256, 3.3798, 3.5989, 3.9067, 2.6845, 3.5165, 2.4947, 3.3910], device='cuda:1'), covar=tensor([0.1668, 0.0932, 0.0947, 0.0684, 0.1283, 0.0704, 0.1978, 0.0813], device='cuda:1'), in_proj_covar=tensor([0.0235, 0.0276, 0.0305, 0.0370, 0.0252, 0.0250, 0.0267, 0.0377], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-19 00:09:23,818 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.4313, 3.5347, 3.2104, 3.0222, 2.6799, 2.5418, 3.5337, 2.3170], device='cuda:1'), covar=tensor([0.0455, 0.0158, 0.0230, 0.0259, 0.0475, 0.0496, 0.0159, 0.0542], device='cuda:1'), in_proj_covar=tensor([0.0203, 0.0174, 0.0176, 0.0202, 0.0211, 0.0209, 0.0183, 0.0216], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 00:09:44,671 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.2132, 3.8810, 4.0604, 4.4225, 3.1003, 3.8915, 2.6740, 4.1119], device='cuda:1'), covar=tensor([0.1593, 0.0854, 0.0843, 0.0662, 0.1124, 0.0649, 0.1817, 0.1051], device='cuda:1'), in_proj_covar=tensor([0.0234, 0.0276, 0.0304, 0.0370, 0.0252, 0.0250, 0.0267, 0.0376], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-19 00:09:47,929 INFO [finetune.py:992] (1/2) Epoch 19, batch 6650, loss[loss=0.1709, simple_loss=0.262, pruned_loss=0.03991, over 12145.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.2538, pruned_loss=0.03654, over 2376456.39 frames. ], batch size: 36, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:10:01,136 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=326883.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:10:14,920 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.587e+02 3.102e+02 3.531e+02 1.010e+03, threshold=6.205e+02, percent-clipped=2.0 2023-05-19 00:10:22,735 INFO [finetune.py:992] (1/2) Epoch 19, batch 6700, loss[loss=0.1493, simple_loss=0.2512, pruned_loss=0.02368, over 12337.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.2537, pruned_loss=0.0363, over 2384146.32 frames. ], batch size: 36, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:10:29,035 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.2461, 4.5713, 2.8196, 2.5784, 3.8912, 2.6425, 3.9074, 3.1637], device='cuda:1'), covar=tensor([0.0710, 0.0643, 0.1143, 0.1613, 0.0385, 0.1365, 0.0536, 0.0863], device='cuda:1'), in_proj_covar=tensor([0.0196, 0.0270, 0.0185, 0.0210, 0.0150, 0.0191, 0.0209, 0.0182], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 00:10:34,983 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=326931.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:10:58,202 INFO [finetune.py:992] (1/2) Epoch 19, batch 6750, loss[loss=0.1699, simple_loss=0.2647, pruned_loss=0.03757, over 12045.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2536, pruned_loss=0.03621, over 2377557.87 frames. ], batch size: 42, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:11:01,180 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.5908, 5.1086, 5.5701, 4.8859, 5.1862, 4.9846, 5.5977, 5.2728], device='cuda:1'), covar=tensor([0.0273, 0.0375, 0.0249, 0.0240, 0.0470, 0.0364, 0.0206, 0.0213], device='cuda:1'), in_proj_covar=tensor([0.0283, 0.0286, 0.0311, 0.0280, 0.0282, 0.0283, 0.0257, 0.0228], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 00:11:21,510 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.4318, 4.1301, 4.2654, 4.5748, 3.0686, 3.9409, 2.7153, 4.2143], device='cuda:1'), covar=tensor([0.1616, 0.0769, 0.0806, 0.0643, 0.1193, 0.0680, 0.1890, 0.1265], device='cuda:1'), in_proj_covar=tensor([0.0237, 0.0277, 0.0306, 0.0372, 0.0252, 0.0252, 0.0269, 0.0379], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-19 00:11:25,743 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.626e+02 2.551e+02 2.862e+02 3.448e+02 6.454e+02, threshold=5.724e+02, percent-clipped=1.0 2023-05-19 00:11:27,300 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=327005.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:11:33,280 INFO [finetune.py:992] (1/2) Epoch 19, batch 6800, loss[loss=0.1335, simple_loss=0.2263, pruned_loss=0.02041, over 12149.00 frames. ], tot_loss[loss=0.1636, simple_loss=0.2537, pruned_loss=0.03673, over 2361162.96 frames. ], batch size: 30, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:12:08,703 INFO [finetune.py:992] (1/2) Epoch 19, batch 6850, loss[loss=0.163, simple_loss=0.2567, pruned_loss=0.03467, over 12143.00 frames. ], tot_loss[loss=0.1638, simple_loss=0.2538, pruned_loss=0.0369, over 2356353.92 frames. ], batch size: 36, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:12:10,254 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=327066.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:12:20,356 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.0022, 4.5964, 4.7660, 4.9034, 4.7471, 4.9455, 4.8227, 2.6761], device='cuda:1'), covar=tensor([0.0095, 0.0089, 0.0101, 0.0070, 0.0057, 0.0101, 0.0089, 0.0856], device='cuda:1'), in_proj_covar=tensor([0.0073, 0.0085, 0.0088, 0.0077, 0.0064, 0.0099, 0.0086, 0.0102], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 00:12:35,625 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.752e+02 2.502e+02 3.064e+02 3.768e+02 7.870e+02, threshold=6.129e+02, percent-clipped=2.0 2023-05-19 00:12:43,155 INFO [finetune.py:992] (1/2) Epoch 19, batch 6900, loss[loss=0.151, simple_loss=0.2496, pruned_loss=0.02621, over 12358.00 frames. ], tot_loss[loss=0.1646, simple_loss=0.2548, pruned_loss=0.03716, over 2361320.75 frames. ], batch size: 36, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:12:57,522 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.26 vs. limit=2.0 2023-05-19 00:13:06,483 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=327147.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:13:09,206 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.7926, 3.7488, 3.4403, 3.2530, 2.9819, 2.8554, 3.7871, 2.6940], device='cuda:1'), covar=tensor([0.0411, 0.0182, 0.0186, 0.0233, 0.0430, 0.0403, 0.0188, 0.0465], device='cuda:1'), in_proj_covar=tensor([0.0204, 0.0174, 0.0175, 0.0201, 0.0210, 0.0209, 0.0184, 0.0215], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 00:13:18,032 INFO [finetune.py:992] (1/2) Epoch 19, batch 6950, loss[loss=0.1731, simple_loss=0.2695, pruned_loss=0.03839, over 12035.00 frames. ], tot_loss[loss=0.1636, simple_loss=0.2542, pruned_loss=0.03649, over 2360478.04 frames. ], batch size: 40, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:13:25,735 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.1620, 3.9196, 3.9641, 4.2769, 2.7820, 3.8516, 2.7821, 3.9533], device='cuda:1'), covar=tensor([0.1580, 0.0717, 0.0833, 0.0662, 0.1268, 0.0624, 0.1663, 0.1113], device='cuda:1'), in_proj_covar=tensor([0.0233, 0.0274, 0.0301, 0.0367, 0.0249, 0.0248, 0.0264, 0.0374], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-19 00:13:45,930 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.614e+02 2.637e+02 3.078e+02 3.550e+02 1.130e+03, threshold=6.155e+02, percent-clipped=5.0 2023-05-19 00:13:49,667 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=327208.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:13:53,767 INFO [finetune.py:992] (1/2) Epoch 19, batch 7000, loss[loss=0.1667, simple_loss=0.262, pruned_loss=0.0357, over 11832.00 frames. ], tot_loss[loss=0.1633, simple_loss=0.2538, pruned_loss=0.03635, over 2368466.90 frames. ], batch size: 44, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:14:28,995 INFO [finetune.py:992] (1/2) Epoch 19, batch 7050, loss[loss=0.1796, simple_loss=0.2677, pruned_loss=0.04573, over 12006.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.253, pruned_loss=0.03587, over 2374318.66 frames. ], batch size: 40, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:14:55,922 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.682e+02 2.679e+02 2.925e+02 3.583e+02 6.371e+02, threshold=5.851e+02, percent-clipped=2.0 2023-05-19 00:15:03,664 INFO [finetune.py:992] (1/2) Epoch 19, batch 7100, loss[loss=0.1741, simple_loss=0.2689, pruned_loss=0.03963, over 12076.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2528, pruned_loss=0.03583, over 2377032.43 frames. ], batch size: 42, lr: 3.15e-03, grad_scale: 32.0 2023-05-19 00:15:27,348 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.4506, 3.5232, 3.2759, 3.0864, 2.8206, 2.6700, 3.5438, 2.3862], device='cuda:1'), covar=tensor([0.0488, 0.0216, 0.0232, 0.0263, 0.0432, 0.0456, 0.0155, 0.0547], device='cuda:1'), in_proj_covar=tensor([0.0203, 0.0173, 0.0174, 0.0201, 0.0208, 0.0208, 0.0183, 0.0214], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 00:15:28,747 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.1866, 2.6756, 3.7582, 3.1261, 3.5006, 3.2183, 2.7760, 3.6348], device='cuda:1'), covar=tensor([0.0156, 0.0403, 0.0165, 0.0264, 0.0174, 0.0237, 0.0390, 0.0156], device='cuda:1'), in_proj_covar=tensor([0.0195, 0.0219, 0.0206, 0.0201, 0.0235, 0.0181, 0.0211, 0.0207], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 00:15:37,132 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=327361.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:15:39,150 INFO [finetune.py:992] (1/2) Epoch 19, batch 7150, loss[loss=0.1662, simple_loss=0.2502, pruned_loss=0.04114, over 12373.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.2523, pruned_loss=0.03567, over 2374964.19 frames. ], batch size: 38, lr: 3.15e-03, grad_scale: 32.0 2023-05-19 00:15:56,050 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=2.84 vs. limit=5.0 2023-05-19 00:16:04,377 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.2053, 3.7920, 3.8672, 4.1797, 2.7537, 3.6361, 2.3796, 3.7694], device='cuda:1'), covar=tensor([0.1607, 0.0783, 0.0924, 0.0690, 0.1277, 0.0690, 0.2119, 0.1004], device='cuda:1'), in_proj_covar=tensor([0.0234, 0.0274, 0.0303, 0.0368, 0.0250, 0.0249, 0.0266, 0.0374], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-19 00:16:07,233 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.905e+02 2.644e+02 3.155e+02 3.771e+02 7.051e+02, threshold=6.311e+02, percent-clipped=3.0 2023-05-19 00:16:14,859 INFO [finetune.py:992] (1/2) Epoch 19, batch 7200, loss[loss=0.155, simple_loss=0.2524, pruned_loss=0.0288, over 12149.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.2535, pruned_loss=0.03647, over 2360697.81 frames. ], batch size: 34, lr: 3.15e-03, grad_scale: 32.0 2023-05-19 00:16:49,217 INFO [finetune.py:992] (1/2) Epoch 19, batch 7250, loss[loss=0.1501, simple_loss=0.2321, pruned_loss=0.03404, over 12342.00 frames. ], tot_loss[loss=0.1635, simple_loss=0.2535, pruned_loss=0.03675, over 2356735.77 frames. ], batch size: 31, lr: 3.15e-03, grad_scale: 32.0 2023-05-19 00:17:17,153 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.676e+02 2.734e+02 3.338e+02 3.881e+02 7.331e+02, threshold=6.676e+02, percent-clipped=2.0 2023-05-19 00:17:17,249 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=327503.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:17:24,845 INFO [finetune.py:992] (1/2) Epoch 19, batch 7300, loss[loss=0.1401, simple_loss=0.2195, pruned_loss=0.03038, over 12185.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.2527, pruned_loss=0.03649, over 2362565.28 frames. ], batch size: 29, lr: 3.15e-03, grad_scale: 32.0 2023-05-19 00:17:25,026 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=327514.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:17:32,442 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=3.30 vs. limit=5.0 2023-05-19 00:17:59,161 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=327562.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:18:00,404 INFO [finetune.py:992] (1/2) Epoch 19, batch 7350, loss[loss=0.1529, simple_loss=0.2467, pruned_loss=0.02958, over 12303.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2528, pruned_loss=0.03625, over 2374929.81 frames. ], batch size: 33, lr: 3.15e-03, grad_scale: 32.0 2023-05-19 00:18:07,943 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=327575.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:18:28,177 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.778e+02 2.443e+02 2.967e+02 3.435e+02 6.066e+02, threshold=5.934e+02, percent-clipped=0.0 2023-05-19 00:18:34,918 INFO [finetune.py:992] (1/2) Epoch 19, batch 7400, loss[loss=0.1768, simple_loss=0.2687, pruned_loss=0.04244, over 12367.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2528, pruned_loss=0.03644, over 2380170.54 frames. ], batch size: 35, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:18:41,367 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=327623.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:19:07,831 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=327661.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:19:09,491 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.21 vs. limit=5.0 2023-05-19 00:19:09,781 INFO [finetune.py:992] (1/2) Epoch 19, batch 7450, loss[loss=0.2112, simple_loss=0.2957, pruned_loss=0.06328, over 8081.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2528, pruned_loss=0.0363, over 2380300.42 frames. ], batch size: 98, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:19:21,020 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.73 vs. limit=5.0 2023-05-19 00:19:25,549 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.8316, 5.8439, 5.6399, 5.1777, 5.1431, 5.7390, 5.3114, 5.1789], device='cuda:1'), covar=tensor([0.0882, 0.0992, 0.0736, 0.1654, 0.0793, 0.0820, 0.1849, 0.1023], device='cuda:1'), in_proj_covar=tensor([0.0676, 0.0601, 0.0556, 0.0677, 0.0454, 0.0787, 0.0833, 0.0600], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0004, 0.0004, 0.0003], device='cuda:1') 2023-05-19 00:19:32,875 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.53 vs. limit=5.0 2023-05-19 00:19:37,982 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.742e+02 2.581e+02 3.083e+02 3.815e+02 6.461e+02, threshold=6.167e+02, percent-clipped=2.0 2023-05-19 00:19:41,462 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=327709.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:19:44,890 INFO [finetune.py:992] (1/2) Epoch 19, batch 7500, loss[loss=0.1452, simple_loss=0.241, pruned_loss=0.02467, over 12144.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.253, pruned_loss=0.03659, over 2371945.06 frames. ], batch size: 34, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:20:19,488 INFO [finetune.py:992] (1/2) Epoch 19, batch 7550, loss[loss=0.1617, simple_loss=0.2451, pruned_loss=0.03914, over 12265.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2526, pruned_loss=0.03633, over 2371483.32 frames. ], batch size: 28, lr: 3.15e-03, grad_scale: 16.0 2023-05-19 00:20:34,198 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.7536, 3.0028, 3.8437, 4.6401, 3.8639, 4.7695, 4.0787, 3.2931], device='cuda:1'), covar=tensor([0.0032, 0.0357, 0.0146, 0.0057, 0.0142, 0.0056, 0.0114, 0.0360], device='cuda:1'), in_proj_covar=tensor([0.0092, 0.0123, 0.0106, 0.0083, 0.0106, 0.0119, 0.0104, 0.0139], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 00:20:47,762 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=327803.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:20:48,340 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.974e+02 2.705e+02 3.112e+02 3.698e+02 7.548e+02, threshold=6.224e+02, percent-clipped=1.0 2023-05-19 00:20:55,364 INFO [finetune.py:992] (1/2) Epoch 19, batch 7600, loss[loss=0.1468, simple_loss=0.2257, pruned_loss=0.03395, over 12036.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.252, pruned_loss=0.03608, over 2373884.86 frames. ], batch size: 31, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:20:58,484 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=2.20 vs. limit=5.0 2023-05-19 00:21:10,427 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.37 vs. limit=2.0 2023-05-19 00:21:21,678 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=327851.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:21:30,796 INFO [finetune.py:992] (1/2) Epoch 19, batch 7650, loss[loss=0.213, simple_loss=0.2897, pruned_loss=0.06814, over 8159.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2529, pruned_loss=0.03653, over 2364082.11 frames. ], batch size: 97, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:21:35,020 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=327870.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:21:43,198 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.75 vs. limit=2.0 2023-05-19 00:21:58,472 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.548e+02 2.582e+02 3.012e+02 3.558e+02 7.024e+02, threshold=6.024e+02, percent-clipped=1.0 2023-05-19 00:22:05,317 INFO [finetune.py:992] (1/2) Epoch 19, batch 7700, loss[loss=0.209, simple_loss=0.2809, pruned_loss=0.06851, over 7795.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2527, pruned_loss=0.03624, over 2364821.37 frames. ], batch size: 98, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:22:08,186 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=327918.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:22:40,558 INFO [finetune.py:992] (1/2) Epoch 19, batch 7750, loss[loss=0.1405, simple_loss=0.2313, pruned_loss=0.02482, over 12324.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2527, pruned_loss=0.03614, over 2366658.51 frames. ], batch size: 31, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:22:59,519 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=327990.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:23:03,829 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=327996.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:23:10,614 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=328002.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:23:10,682 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.0222, 4.5546, 3.9250, 4.7614, 4.2639, 2.7740, 3.9500, 2.9755], device='cuda:1'), covar=tensor([0.1010, 0.0810, 0.1675, 0.0641, 0.1277, 0.1900, 0.1153, 0.3434], device='cuda:1'), in_proj_covar=tensor([0.0319, 0.0389, 0.0373, 0.0351, 0.0382, 0.0284, 0.0357, 0.0376], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 00:23:10,702 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.6079, 2.7180, 4.4171, 4.5210, 2.8581, 2.5872, 2.7617, 2.2170], device='cuda:1'), covar=tensor([0.1838, 0.3182, 0.0547, 0.0468, 0.1393, 0.2659, 0.3057, 0.4127], device='cuda:1'), in_proj_covar=tensor([0.0313, 0.0399, 0.0284, 0.0310, 0.0282, 0.0326, 0.0409, 0.0385], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-19 00:23:10,905 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.50 vs. limit=5.0 2023-05-19 00:23:11,822 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.074e+02 2.697e+02 3.096e+02 3.856e+02 1.458e+03, threshold=6.191e+02, percent-clipped=3.0 2023-05-19 00:23:18,639 INFO [finetune.py:992] (1/2) Epoch 19, batch 7800, loss[loss=0.1863, simple_loss=0.2825, pruned_loss=0.04508, over 12057.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2529, pruned_loss=0.03648, over 2371306.17 frames. ], batch size: 37, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:23:30,132 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=328030.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:23:44,597 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=328051.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 00:23:48,872 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=328057.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:23:53,059 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=328063.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:23:53,576 INFO [finetune.py:992] (1/2) Epoch 19, batch 7850, loss[loss=0.1371, simple_loss=0.2253, pruned_loss=0.02444, over 12417.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.2532, pruned_loss=0.03667, over 2364680.20 frames. ], batch size: 32, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:24:13,259 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=328091.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:24:22,220 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.84 vs. limit=2.0 2023-05-19 00:24:22,349 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.646e+02 2.622e+02 2.990e+02 3.532e+02 1.143e+03, threshold=5.980e+02, percent-clipped=1.0 2023-05-19 00:24:29,363 INFO [finetune.py:992] (1/2) Epoch 19, batch 7900, loss[loss=0.1964, simple_loss=0.2861, pruned_loss=0.05335, over 12028.00 frames. ], tot_loss[loss=0.1639, simple_loss=0.2536, pruned_loss=0.03711, over 2357030.56 frames. ], batch size: 40, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:24:45,864 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.4614, 5.3089, 5.3918, 5.4776, 5.1355, 5.1447, 4.8932, 5.4111], device='cuda:1'), covar=tensor([0.0727, 0.0634, 0.0994, 0.0578, 0.1834, 0.1414, 0.0561, 0.1222], device='cuda:1'), in_proj_covar=tensor([0.0578, 0.0758, 0.0663, 0.0678, 0.0904, 0.0793, 0.0605, 0.0517], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:1') 2023-05-19 00:25:04,409 INFO [finetune.py:992] (1/2) Epoch 19, batch 7950, loss[loss=0.1481, simple_loss=0.2422, pruned_loss=0.02707, over 12275.00 frames. ], tot_loss[loss=0.1642, simple_loss=0.254, pruned_loss=0.03721, over 2360161.51 frames. ], batch size: 33, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:25:08,617 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=328170.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:25:32,225 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.981e+02 2.594e+02 2.996e+02 3.774e+02 6.763e+02, threshold=5.992e+02, percent-clipped=1.0 2023-05-19 00:25:39,800 INFO [finetune.py:992] (1/2) Epoch 19, batch 8000, loss[loss=0.2847, simple_loss=0.3449, pruned_loss=0.1123, over 7524.00 frames. ], tot_loss[loss=0.1637, simple_loss=0.2536, pruned_loss=0.03692, over 2363484.62 frames. ], batch size: 97, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:25:42,569 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=328218.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:25:42,622 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=328218.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:26:14,229 INFO [finetune.py:992] (1/2) Epoch 19, batch 8050, loss[loss=0.1891, simple_loss=0.2762, pruned_loss=0.05096, over 12026.00 frames. ], tot_loss[loss=0.1647, simple_loss=0.2544, pruned_loss=0.03756, over 2358198.07 frames. ], batch size: 40, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:26:16,259 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=328266.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:26:42,325 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.898e+02 2.636e+02 3.070e+02 3.893e+02 1.409e+03, threshold=6.140e+02, percent-clipped=3.0 2023-05-19 00:26:49,275 INFO [finetune.py:992] (1/2) Epoch 19, batch 8100, loss[loss=0.1545, simple_loss=0.2495, pruned_loss=0.02972, over 10743.00 frames. ], tot_loss[loss=0.1652, simple_loss=0.2545, pruned_loss=0.03792, over 2351777.75 frames. ], batch size: 68, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:27:11,441 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=328346.0, num_to_drop=1, layers_to_drop={2} 2023-05-19 00:27:16,240 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=328352.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:27:20,352 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=328358.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:27:24,448 INFO [finetune.py:992] (1/2) Epoch 19, batch 8150, loss[loss=0.1547, simple_loss=0.2405, pruned_loss=0.03448, over 12189.00 frames. ], tot_loss[loss=0.1645, simple_loss=0.2539, pruned_loss=0.03761, over 2359566.69 frames. ], batch size: 31, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:27:27,535 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.1337, 3.9208, 2.6089, 2.2794, 3.4579, 2.3557, 3.5306, 2.9586], device='cuda:1'), covar=tensor([0.0769, 0.0909, 0.1255, 0.1838, 0.0404, 0.1506, 0.0654, 0.0852], device='cuda:1'), in_proj_covar=tensor([0.0197, 0.0272, 0.0187, 0.0213, 0.0151, 0.0193, 0.0211, 0.0183], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 00:27:32,413 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.0720, 4.5839, 4.0075, 4.8503, 4.3721, 2.7119, 4.0866, 2.8940], device='cuda:1'), covar=tensor([0.0946, 0.0774, 0.1613, 0.0667, 0.1202, 0.2004, 0.1110, 0.3651], device='cuda:1'), in_proj_covar=tensor([0.0316, 0.0386, 0.0370, 0.0349, 0.0379, 0.0281, 0.0354, 0.0372], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 00:27:36,111 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.25 vs. limit=2.0 2023-05-19 00:27:39,901 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=328386.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:27:52,510 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.008e+02 2.617e+02 3.231e+02 3.569e+02 6.494e+02, threshold=6.462e+02, percent-clipped=2.0 2023-05-19 00:27:59,274 INFO [finetune.py:992] (1/2) Epoch 19, batch 8200, loss[loss=0.1797, simple_loss=0.2709, pruned_loss=0.04427, over 12107.00 frames. ], tot_loss[loss=0.1643, simple_loss=0.2539, pruned_loss=0.0374, over 2366271.41 frames. ], batch size: 38, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:28:34,504 INFO [finetune.py:992] (1/2) Epoch 19, batch 8250, loss[loss=0.1574, simple_loss=0.2449, pruned_loss=0.03493, over 12090.00 frames. ], tot_loss[loss=0.1638, simple_loss=0.2533, pruned_loss=0.03716, over 2369735.73 frames. ], batch size: 32, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:29:02,777 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.551e+02 2.709e+02 3.179e+02 3.720e+02 7.019e+02, threshold=6.358e+02, percent-clipped=3.0 2023-05-19 00:29:09,590 INFO [finetune.py:992] (1/2) Epoch 19, batch 8300, loss[loss=0.1483, simple_loss=0.2452, pruned_loss=0.02571, over 12293.00 frames. ], tot_loss[loss=0.1644, simple_loss=0.254, pruned_loss=0.03743, over 2360911.87 frames. ], batch size: 33, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:29:43,864 INFO [finetune.py:992] (1/2) Epoch 19, batch 8350, loss[loss=0.1382, simple_loss=0.2215, pruned_loss=0.02746, over 12358.00 frames. ], tot_loss[loss=0.165, simple_loss=0.2543, pruned_loss=0.03785, over 2355858.36 frames. ], batch size: 30, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:30:12,370 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.605e+02 3.075e+02 3.770e+02 5.047e+02, threshold=6.151e+02, percent-clipped=0.0 2023-05-19 00:30:19,352 INFO [finetune.py:992] (1/2) Epoch 19, batch 8400, loss[loss=0.1653, simple_loss=0.2566, pruned_loss=0.03696, over 12294.00 frames. ], tot_loss[loss=0.1649, simple_loss=0.2546, pruned_loss=0.03764, over 2359672.64 frames. ], batch size: 33, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:30:42,115 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=328646.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:30:46,239 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=328652.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:30:50,410 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=328658.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:30:54,508 INFO [finetune.py:992] (1/2) Epoch 19, batch 8450, loss[loss=0.1767, simple_loss=0.27, pruned_loss=0.04175, over 12030.00 frames. ], tot_loss[loss=0.1644, simple_loss=0.2543, pruned_loss=0.03728, over 2365856.17 frames. ], batch size: 40, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:31:09,914 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=328686.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:31:15,265 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=328694.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:31:17,036 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2023-05-19 00:31:19,554 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=328700.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:31:22,244 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.713e+02 2.607e+02 3.087e+02 3.747e+02 7.122e+02, threshold=6.173e+02, percent-clipped=3.0 2023-05-19 00:31:23,689 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=328706.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:31:29,727 INFO [finetune.py:992] (1/2) Epoch 19, batch 8500, loss[loss=0.1676, simple_loss=0.2441, pruned_loss=0.04556, over 12082.00 frames. ], tot_loss[loss=0.1646, simple_loss=0.2547, pruned_loss=0.03721, over 2362434.27 frames. ], batch size: 32, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:31:41,267 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=328730.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:31:43,954 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=328734.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:31:53,741 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=328748.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:32:04,475 INFO [finetune.py:992] (1/2) Epoch 19, batch 8550, loss[loss=0.183, simple_loss=0.2825, pruned_loss=0.04175, over 11303.00 frames. ], tot_loss[loss=0.1643, simple_loss=0.2546, pruned_loss=0.03704, over 2367147.13 frames. ], batch size: 55, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:32:04,877 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.30 vs. limit=2.0 2023-05-19 00:32:24,450 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=328791.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:32:33,599 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.478e+02 2.919e+02 3.511e+02 6.754e+02, threshold=5.838e+02, percent-clipped=1.0 2023-05-19 00:32:36,479 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.2946, 4.8455, 5.3017, 4.5418, 4.9611, 4.7165, 5.3547, 4.9478], device='cuda:1'), covar=tensor([0.0296, 0.0442, 0.0294, 0.0292, 0.0425, 0.0345, 0.0199, 0.0361], device='cuda:1'), in_proj_covar=tensor([0.0289, 0.0291, 0.0319, 0.0286, 0.0287, 0.0286, 0.0262, 0.0235], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 00:32:37,246 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=328809.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:32:40,601 INFO [finetune.py:992] (1/2) Epoch 19, batch 8600, loss[loss=0.2175, simple_loss=0.2982, pruned_loss=0.06844, over 7983.00 frames. ], tot_loss[loss=0.1644, simple_loss=0.2548, pruned_loss=0.03704, over 2365859.61 frames. ], batch size: 97, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:32:45,047 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.1570, 4.6453, 3.9843, 4.8938, 4.4684, 2.8946, 4.1686, 2.9285], device='cuda:1'), covar=tensor([0.0933, 0.0751, 0.1597, 0.0573, 0.1241, 0.1840, 0.1211, 0.3920], device='cuda:1'), in_proj_covar=tensor([0.0315, 0.0386, 0.0370, 0.0349, 0.0378, 0.0281, 0.0354, 0.0373], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 00:32:46,434 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.2645, 2.7011, 3.8146, 3.2818, 3.6386, 3.3569, 2.8411, 3.7288], device='cuda:1'), covar=tensor([0.0157, 0.0363, 0.0175, 0.0251, 0.0188, 0.0203, 0.0368, 0.0149], device='cuda:1'), in_proj_covar=tensor([0.0195, 0.0216, 0.0205, 0.0200, 0.0231, 0.0179, 0.0209, 0.0206], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 00:32:50,572 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=328828.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:33:03,707 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.7813, 3.4614, 5.1036, 2.8906, 3.0488, 3.9370, 3.2296, 3.7210], device='cuda:1'), covar=tensor([0.0409, 0.1115, 0.0379, 0.1054, 0.1735, 0.1400, 0.1386, 0.1490], device='cuda:1'), in_proj_covar=tensor([0.0242, 0.0239, 0.0266, 0.0188, 0.0239, 0.0297, 0.0229, 0.0275], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 00:33:13,308 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=328861.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:33:15,793 INFO [finetune.py:992] (1/2) Epoch 19, batch 8650, loss[loss=0.1579, simple_loss=0.2484, pruned_loss=0.03369, over 12149.00 frames. ], tot_loss[loss=0.164, simple_loss=0.2543, pruned_loss=0.03683, over 2373009.22 frames. ], batch size: 34, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:33:33,161 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=328889.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:33:43,392 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.783e+02 2.567e+02 3.201e+02 3.541e+02 5.450e+02, threshold=6.402e+02, percent-clipped=0.0 2023-05-19 00:33:50,353 INFO [finetune.py:992] (1/2) Epoch 19, batch 8700, loss[loss=0.1822, simple_loss=0.2818, pruned_loss=0.04133, over 12206.00 frames. ], tot_loss[loss=0.1645, simple_loss=0.255, pruned_loss=0.03704, over 2369223.05 frames. ], batch size: 35, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:33:56,003 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=328922.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:34:25,540 INFO [finetune.py:992] (1/2) Epoch 19, batch 8750, loss[loss=0.1608, simple_loss=0.2563, pruned_loss=0.03268, over 12129.00 frames. ], tot_loss[loss=0.1642, simple_loss=0.2549, pruned_loss=0.03679, over 2370545.41 frames. ], batch size: 38, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:34:53,804 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.903e+02 2.542e+02 3.003e+02 3.580e+02 9.086e+02, threshold=6.006e+02, percent-clipped=1.0 2023-05-19 00:35:01,328 INFO [finetune.py:992] (1/2) Epoch 19, batch 8800, loss[loss=0.169, simple_loss=0.2623, pruned_loss=0.03787, over 11284.00 frames. ], tot_loss[loss=0.1639, simple_loss=0.2541, pruned_loss=0.03684, over 2371294.03 frames. ], batch size: 55, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:35:02,157 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.1992, 5.0371, 5.1360, 5.1947, 4.8278, 4.8832, 4.6359, 5.0710], device='cuda:1'), covar=tensor([0.0685, 0.0678, 0.1010, 0.0580, 0.2030, 0.1376, 0.0647, 0.1165], device='cuda:1'), in_proj_covar=tensor([0.0578, 0.0752, 0.0662, 0.0675, 0.0906, 0.0789, 0.0604, 0.0516], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:1') 2023-05-19 00:35:11,825 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.6199, 3.1520, 5.1540, 2.4956, 2.7412, 3.7223, 3.0897, 3.7533], device='cuda:1'), covar=tensor([0.0513, 0.1359, 0.0290, 0.1342, 0.2067, 0.1605, 0.1602, 0.1334], device='cuda:1'), in_proj_covar=tensor([0.0242, 0.0239, 0.0266, 0.0188, 0.0239, 0.0297, 0.0228, 0.0274], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 00:35:35,955 INFO [finetune.py:992] (1/2) Epoch 19, batch 8850, loss[loss=0.2511, simple_loss=0.3272, pruned_loss=0.08745, over 8173.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.2536, pruned_loss=0.03657, over 2366822.24 frames. ], batch size: 100, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:35:44,193 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.41 vs. limit=2.0 2023-05-19 00:35:45,052 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=329077.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:35:51,812 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=329086.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:36:04,367 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.789e+02 2.566e+02 3.001e+02 3.699e+02 8.491e+02, threshold=6.002e+02, percent-clipped=2.0 2023-05-19 00:36:04,461 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=329104.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:36:10,986 INFO [finetune.py:992] (1/2) Epoch 19, batch 8900, loss[loss=0.1689, simple_loss=0.262, pruned_loss=0.03789, over 12085.00 frames. ], tot_loss[loss=0.1633, simple_loss=0.2532, pruned_loss=0.03664, over 2365109.81 frames. ], batch size: 32, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:36:27,671 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=329138.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:36:46,171 INFO [finetune.py:992] (1/2) Epoch 19, batch 8950, loss[loss=0.1581, simple_loss=0.2335, pruned_loss=0.04133, over 12344.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2527, pruned_loss=0.03657, over 2363969.75 frames. ], batch size: 30, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:36:57,449 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.7370, 2.9038, 4.5732, 4.6911, 2.8356, 2.6335, 3.0731, 2.2239], device='cuda:1'), covar=tensor([0.1815, 0.3217, 0.0515, 0.0499, 0.1447, 0.2657, 0.2824, 0.4227], device='cuda:1'), in_proj_covar=tensor([0.0316, 0.0400, 0.0288, 0.0313, 0.0285, 0.0330, 0.0412, 0.0388], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-19 00:37:00,126 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=329184.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:37:15,438 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.669e+02 2.793e+02 3.123e+02 3.747e+02 1.122e+03, threshold=6.245e+02, percent-clipped=3.0 2023-05-19 00:37:15,687 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.1448, 3.5523, 3.7550, 4.1460, 2.8775, 3.5995, 2.7463, 3.7851], device='cuda:1'), covar=tensor([0.1711, 0.0911, 0.0853, 0.0729, 0.1163, 0.0705, 0.1754, 0.0922], device='cuda:1'), in_proj_covar=tensor([0.0234, 0.0273, 0.0302, 0.0367, 0.0249, 0.0247, 0.0266, 0.0373], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-19 00:37:22,324 INFO [finetune.py:992] (1/2) Epoch 19, batch 9000, loss[loss=0.1551, simple_loss=0.2416, pruned_loss=0.03431, over 12192.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.2532, pruned_loss=0.03684, over 2364931.85 frames. ], batch size: 31, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:37:22,324 INFO [finetune.py:1017] (1/2) Computing validation loss 2023-05-19 00:37:39,915 INFO [finetune.py:1026] (1/2) Epoch 19, validation: loss=0.3217, simple_loss=0.3941, pruned_loss=0.1246, over 1020973.00 frames. 2023-05-19 00:37:39,916 INFO [finetune.py:1027] (1/2) Maximum memory allocated so far is 12411MB 2023-05-19 00:37:42,000 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=329217.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:37:50,994 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.6295, 2.8111, 3.8423, 4.7016, 3.8559, 4.8441, 4.1423, 3.5803], device='cuda:1'), covar=tensor([0.0057, 0.0443, 0.0167, 0.0060, 0.0162, 0.0063, 0.0133, 0.0334], device='cuda:1'), in_proj_covar=tensor([0.0094, 0.0127, 0.0109, 0.0086, 0.0109, 0.0122, 0.0108, 0.0143], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 00:38:14,278 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=329263.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:38:14,774 INFO [finetune.py:992] (1/2) Epoch 19, batch 9050, loss[loss=0.1346, simple_loss=0.2215, pruned_loss=0.02385, over 12332.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2521, pruned_loss=0.03623, over 2369876.54 frames. ], batch size: 30, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:38:18,438 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=329269.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:38:42,480 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.989e+02 2.752e+02 3.154e+02 3.848e+02 9.009e+02, threshold=6.308e+02, percent-clipped=2.0 2023-05-19 00:38:49,337 INFO [finetune.py:992] (1/2) Epoch 19, batch 9100, loss[loss=0.1576, simple_loss=0.2432, pruned_loss=0.03597, over 12024.00 frames. ], tot_loss[loss=0.164, simple_loss=0.2537, pruned_loss=0.03712, over 2360151.85 frames. ], batch size: 31, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:38:56,942 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=329324.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:39:01,184 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=329330.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:39:07,740 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=329340.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 00:39:24,482 INFO [finetune.py:992] (1/2) Epoch 19, batch 9150, loss[loss=0.1598, simple_loss=0.2512, pruned_loss=0.03419, over 11742.00 frames. ], tot_loss[loss=0.1645, simple_loss=0.2542, pruned_loss=0.03738, over 2358825.92 frames. ], batch size: 26, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:39:36,398 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=329381.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:39:39,751 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=329386.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:39:51,149 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=329401.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 00:39:53,061 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.947e+02 2.594e+02 3.008e+02 3.738e+02 1.154e+03, threshold=6.015e+02, percent-clipped=2.0 2023-05-19 00:39:53,233 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=329404.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:39:59,906 INFO [finetune.py:992] (1/2) Epoch 19, batch 9200, loss[loss=0.1692, simple_loss=0.2663, pruned_loss=0.03607, over 12255.00 frames. ], tot_loss[loss=0.1639, simple_loss=0.2538, pruned_loss=0.037, over 2363281.01 frames. ], batch size: 37, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:40:13,470 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=329433.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:40:14,160 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=329434.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:40:19,827 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=329442.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:40:20,594 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.8425, 3.1622, 4.7863, 4.9508, 2.9927, 2.7713, 3.0883, 2.2730], device='cuda:1'), covar=tensor([0.1744, 0.2790, 0.0455, 0.0420, 0.1456, 0.2532, 0.2857, 0.4283], device='cuda:1'), in_proj_covar=tensor([0.0320, 0.0406, 0.0292, 0.0319, 0.0289, 0.0335, 0.0417, 0.0394], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-19 00:40:26,674 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=329452.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:40:34,951 INFO [finetune.py:992] (1/2) Epoch 19, batch 9250, loss[loss=0.1624, simple_loss=0.2592, pruned_loss=0.03286, over 12124.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2522, pruned_loss=0.03643, over 2369681.99 frames. ], batch size: 39, lr: 3.14e-03, grad_scale: 16.0 2023-05-19 00:40:35,451 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.63 vs. limit=5.0 2023-05-19 00:40:42,896 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.2694, 4.9680, 5.1529, 5.1651, 5.0060, 5.2225, 5.0314, 2.8210], device='cuda:1'), covar=tensor([0.0102, 0.0061, 0.0071, 0.0055, 0.0043, 0.0085, 0.0072, 0.0725], device='cuda:1'), in_proj_covar=tensor([0.0073, 0.0084, 0.0087, 0.0077, 0.0063, 0.0098, 0.0086, 0.0101], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 00:40:49,758 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=329484.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:41:01,593 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=329501.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:41:03,523 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.877e+02 2.584e+02 2.998e+02 3.473e+02 7.449e+02, threshold=5.995e+02, percent-clipped=2.0 2023-05-19 00:41:10,371 INFO [finetune.py:992] (1/2) Epoch 19, batch 9300, loss[loss=0.1672, simple_loss=0.264, pruned_loss=0.03522, over 12287.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2526, pruned_loss=0.03626, over 2369187.00 frames. ], batch size: 37, lr: 3.13e-03, grad_scale: 16.0 2023-05-19 00:41:12,639 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=329517.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:41:22,956 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=329532.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:41:30,536 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=329543.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:41:34,706 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.9230, 3.4904, 5.2159, 2.6292, 3.1444, 3.7913, 3.4905, 3.8044], device='cuda:1'), covar=tensor([0.0394, 0.1077, 0.0320, 0.1245, 0.1800, 0.1585, 0.1243, 0.1335], device='cuda:1'), in_proj_covar=tensor([0.0244, 0.0242, 0.0269, 0.0190, 0.0241, 0.0300, 0.0230, 0.0276], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 00:41:38,734 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=329554.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:41:44,253 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=329562.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:41:45,469 INFO [finetune.py:992] (1/2) Epoch 19, batch 9350, loss[loss=0.145, simple_loss=0.2367, pruned_loss=0.02668, over 12288.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.252, pruned_loss=0.0361, over 2374875.19 frames. ], batch size: 33, lr: 3.13e-03, grad_scale: 16.0 2023-05-19 00:41:46,225 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=329565.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:41:57,072 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.3742, 2.3179, 3.7934, 4.2946, 3.7927, 4.4181, 4.0327, 3.1953], device='cuda:1'), covar=tensor([0.0059, 0.0602, 0.0135, 0.0070, 0.0148, 0.0084, 0.0109, 0.0397], device='cuda:1'), in_proj_covar=tensor([0.0095, 0.0127, 0.0108, 0.0086, 0.0108, 0.0121, 0.0108, 0.0143], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 00:42:00,488 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.4195, 3.5177, 3.1909, 3.0864, 2.8436, 2.7384, 3.4807, 2.3988], device='cuda:1'), covar=tensor([0.0472, 0.0142, 0.0212, 0.0228, 0.0410, 0.0380, 0.0175, 0.0539], device='cuda:1'), in_proj_covar=tensor([0.0204, 0.0174, 0.0175, 0.0201, 0.0209, 0.0207, 0.0184, 0.0213], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 00:42:04,167 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.56 vs. limit=2.0 2023-05-19 00:42:13,092 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.889e+02 2.643e+02 3.011e+02 3.895e+02 7.917e+02, threshold=6.022e+02, percent-clipped=5.0 2023-05-19 00:42:13,306 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=329604.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:42:20,592 INFO [finetune.py:992] (1/2) Epoch 19, batch 9400, loss[loss=0.1415, simple_loss=0.2336, pruned_loss=0.02466, over 12092.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2525, pruned_loss=0.03617, over 2373141.84 frames. ], batch size: 32, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:42:21,476 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=329615.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:42:24,158 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=329619.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:42:28,428 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=329625.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:42:35,047 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.94 vs. limit=2.0 2023-05-19 00:42:55,271 INFO [finetune.py:992] (1/2) Epoch 19, batch 9450, loss[loss=0.1768, simple_loss=0.2756, pruned_loss=0.03902, over 12070.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2526, pruned_loss=0.03619, over 2364806.79 frames. ], batch size: 42, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:43:13,936 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.1934, 4.7529, 5.1625, 4.5270, 4.8259, 4.6361, 5.2136, 4.8493], device='cuda:1'), covar=tensor([0.0310, 0.0406, 0.0303, 0.0275, 0.0485, 0.0336, 0.0226, 0.0364], device='cuda:1'), in_proj_covar=tensor([0.0286, 0.0288, 0.0315, 0.0282, 0.0284, 0.0282, 0.0258, 0.0231], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 00:43:18,090 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=329696.0, num_to_drop=1, layers_to_drop={3} 2023-05-19 00:43:19,685 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=2.41 vs. limit=5.0 2023-05-19 00:43:23,552 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.921e+02 2.436e+02 3.019e+02 3.891e+02 1.058e+03, threshold=6.038e+02, percent-clipped=2.0 2023-05-19 00:43:23,907 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.16 vs. limit=2.0 2023-05-19 00:43:30,470 INFO [finetune.py:992] (1/2) Epoch 19, batch 9500, loss[loss=0.144, simple_loss=0.2324, pruned_loss=0.0278, over 12250.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2532, pruned_loss=0.03633, over 2360826.21 frames. ], batch size: 32, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:43:31,635 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.12 vs. limit=2.0 2023-05-19 00:43:43,962 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=329733.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:43:45,439 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.5201, 2.5030, 3.2291, 4.3351, 2.4879, 4.3367, 4.4513, 4.5121], device='cuda:1'), covar=tensor([0.0130, 0.1347, 0.0491, 0.0154, 0.1267, 0.0280, 0.0143, 0.0103], device='cuda:1'), in_proj_covar=tensor([0.0129, 0.0210, 0.0189, 0.0128, 0.0195, 0.0189, 0.0188, 0.0132], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0002], device='cuda:1') 2023-05-19 00:43:46,672 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=329737.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:44:01,307 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.4495, 5.0441, 5.4343, 4.7934, 5.0703, 4.8893, 5.4719, 5.0887], device='cuda:1'), covar=tensor([0.0301, 0.0411, 0.0312, 0.0255, 0.0480, 0.0303, 0.0247, 0.0332], device='cuda:1'), in_proj_covar=tensor([0.0287, 0.0289, 0.0316, 0.0283, 0.0286, 0.0283, 0.0259, 0.0233], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 00:44:06,079 INFO [finetune.py:992] (1/2) Epoch 19, batch 9550, loss[loss=0.1357, simple_loss=0.2196, pruned_loss=0.0259, over 12326.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.252, pruned_loss=0.03581, over 2367046.16 frames. ], batch size: 30, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:44:17,886 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.74 vs. limit=5.0 2023-05-19 00:44:18,108 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=329781.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:44:28,709 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.6878, 2.9562, 3.4437, 4.5509, 2.7975, 4.5390, 4.6244, 4.7347], device='cuda:1'), covar=tensor([0.0187, 0.1181, 0.0475, 0.0166, 0.1201, 0.0304, 0.0204, 0.0099], device='cuda:1'), in_proj_covar=tensor([0.0129, 0.0210, 0.0189, 0.0128, 0.0196, 0.0189, 0.0188, 0.0133], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0002], device='cuda:1') 2023-05-19 00:44:34,369 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.697e+02 2.446e+02 2.927e+02 3.662e+02 8.216e+02, threshold=5.853e+02, percent-clipped=4.0 2023-05-19 00:44:41,231 INFO [finetune.py:992] (1/2) Epoch 19, batch 9600, loss[loss=0.1533, simple_loss=0.2463, pruned_loss=0.03018, over 12348.00 frames. ], tot_loss[loss=0.161, simple_loss=0.2511, pruned_loss=0.03544, over 2370092.69 frames. ], batch size: 35, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:45:11,917 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=329857.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:45:16,631 INFO [finetune.py:992] (1/2) Epoch 19, batch 9650, loss[loss=0.1549, simple_loss=0.2515, pruned_loss=0.02919, over 12185.00 frames. ], tot_loss[loss=0.1608, simple_loss=0.2508, pruned_loss=0.03544, over 2369663.85 frames. ], batch size: 35, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:45:19,510 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=329868.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:45:40,925 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=329899.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:45:44,310 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.030e+02 2.640e+02 3.056e+02 3.733e+02 8.200e+02, threshold=6.112e+02, percent-clipped=3.0 2023-05-19 00:45:49,137 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=329910.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:45:51,818 INFO [finetune.py:992] (1/2) Epoch 19, batch 9700, loss[loss=0.165, simple_loss=0.2554, pruned_loss=0.03732, over 12155.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2525, pruned_loss=0.03595, over 2373440.51 frames. ], batch size: 34, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:45:55,434 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=329919.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:45:59,695 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=329925.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:46:02,468 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=329929.0, num_to_drop=1, layers_to_drop={2} 2023-05-19 00:46:08,046 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.4454, 2.6764, 3.7188, 4.4578, 3.7810, 4.5561, 3.9009, 3.2778], device='cuda:1'), covar=tensor([0.0054, 0.0430, 0.0154, 0.0051, 0.0136, 0.0073, 0.0140, 0.0370], device='cuda:1'), in_proj_covar=tensor([0.0095, 0.0127, 0.0109, 0.0086, 0.0109, 0.0122, 0.0109, 0.0143], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 00:46:26,503 INFO [finetune.py:992] (1/2) Epoch 19, batch 9750, loss[loss=0.1626, simple_loss=0.261, pruned_loss=0.03207, over 10562.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.252, pruned_loss=0.03566, over 2371891.91 frames. ], batch size: 68, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:46:28,698 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=329967.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:46:32,849 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=329973.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:46:42,010 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.17 vs. limit=2.0 2023-05-19 00:46:47,120 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=329993.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:46:49,097 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=329996.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 00:46:57,549 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.820e+02 2.696e+02 3.076e+02 3.718e+02 7.389e+02, threshold=6.152e+02, percent-clipped=3.0 2023-05-19 00:47:04,583 INFO [finetune.py:992] (1/2) Epoch 19, batch 9800, loss[loss=0.1353, simple_loss=0.2178, pruned_loss=0.02641, over 12281.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2526, pruned_loss=0.03592, over 2374958.43 frames. ], batch size: 28, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:47:20,560 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=330037.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:47:25,395 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=330044.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 00:47:32,925 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=330054.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:47:39,702 INFO [finetune.py:992] (1/2) Epoch 19, batch 9850, loss[loss=0.1326, simple_loss=0.2135, pruned_loss=0.02585, over 11364.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2523, pruned_loss=0.03595, over 2371356.65 frames. ], batch size: 25, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:47:45,638 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.08 vs. limit=5.0 2023-05-19 00:47:54,315 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=330085.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:48:07,189 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.714e+02 3.098e+02 3.571e+02 7.678e+02, threshold=6.196e+02, percent-clipped=2.0 2023-05-19 00:48:14,326 INFO [finetune.py:992] (1/2) Epoch 19, batch 9900, loss[loss=0.1913, simple_loss=0.2801, pruned_loss=0.05122, over 11262.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2524, pruned_loss=0.0362, over 2374145.59 frames. ], batch size: 55, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:48:25,886 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.2634, 5.1228, 5.2284, 5.2760, 4.9301, 4.9276, 4.7197, 5.1514], device='cuda:1'), covar=tensor([0.0800, 0.0636, 0.0989, 0.0611, 0.1915, 0.1499, 0.0640, 0.1241], device='cuda:1'), in_proj_covar=tensor([0.0576, 0.0754, 0.0661, 0.0674, 0.0904, 0.0790, 0.0606, 0.0515], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:1') 2023-05-19 00:48:35,185 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=2.00 vs. limit=2.0 2023-05-19 00:48:45,361 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=330157.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:48:50,045 INFO [finetune.py:992] (1/2) Epoch 19, batch 9950, loss[loss=0.1502, simple_loss=0.2422, pruned_loss=0.02905, over 12352.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.252, pruned_loss=0.03616, over 2371954.57 frames. ], batch size: 35, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:49:02,709 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.2919, 4.8433, 5.2880, 4.6065, 4.9547, 4.7520, 5.3444, 5.0137], device='cuda:1'), covar=tensor([0.0299, 0.0407, 0.0281, 0.0272, 0.0434, 0.0308, 0.0199, 0.0310], device='cuda:1'), in_proj_covar=tensor([0.0285, 0.0289, 0.0315, 0.0282, 0.0284, 0.0283, 0.0258, 0.0231], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 00:49:15,224 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=330199.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:49:18,897 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.759e+02 2.491e+02 2.998e+02 3.513e+02 7.813e+02, threshold=5.996e+02, percent-clipped=2.0 2023-05-19 00:49:19,722 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=330205.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:49:23,189 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=330210.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:49:25,790 INFO [finetune.py:992] (1/2) Epoch 19, batch 10000, loss[loss=0.1568, simple_loss=0.2591, pruned_loss=0.02726, over 12346.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2522, pruned_loss=0.0361, over 2373557.97 frames. ], batch size: 36, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:49:32,656 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=330224.0, num_to_drop=1, layers_to_drop={2} 2023-05-19 00:49:48,501 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=330247.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:49:53,002 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.49 vs. limit=2.0 2023-05-19 00:49:56,005 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=330258.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:50:00,215 INFO [finetune.py:992] (1/2) Epoch 19, batch 10050, loss[loss=0.1673, simple_loss=0.2574, pruned_loss=0.03861, over 11287.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2521, pruned_loss=0.03604, over 2369939.13 frames. ], batch size: 55, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:50:28,379 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.676e+02 2.491e+02 3.061e+02 3.700e+02 9.999e+02, threshold=6.122e+02, percent-clipped=4.0 2023-05-19 00:50:35,435 INFO [finetune.py:992] (1/2) Epoch 19, batch 10100, loss[loss=0.1725, simple_loss=0.264, pruned_loss=0.04046, over 11307.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2527, pruned_loss=0.03615, over 2373570.68 frames. ], batch size: 55, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:51:00,249 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=330349.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:51:08,803 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.5356, 5.3789, 5.4928, 5.5467, 5.1627, 5.2124, 4.9998, 5.3846], device='cuda:1'), covar=tensor([0.0689, 0.0566, 0.0806, 0.0558, 0.1898, 0.1207, 0.0591, 0.1244], device='cuda:1'), in_proj_covar=tensor([0.0578, 0.0757, 0.0663, 0.0677, 0.0908, 0.0792, 0.0608, 0.0516], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:1') 2023-05-19 00:51:10,772 INFO [finetune.py:992] (1/2) Epoch 19, batch 10150, loss[loss=0.1714, simple_loss=0.2669, pruned_loss=0.03797, over 11286.00 frames. ], tot_loss[loss=0.162, simple_loss=0.2522, pruned_loss=0.03591, over 2369695.49 frames. ], batch size: 55, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:51:38,427 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.7946, 3.7440, 3.3419, 3.1793, 2.9871, 2.8436, 3.7054, 2.4641], device='cuda:1'), covar=tensor([0.0370, 0.0132, 0.0251, 0.0246, 0.0422, 0.0386, 0.0151, 0.0526], device='cuda:1'), in_proj_covar=tensor([0.0204, 0.0173, 0.0176, 0.0203, 0.0209, 0.0209, 0.0184, 0.0214], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 00:51:38,899 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.105e+02 2.529e+02 2.869e+02 3.349e+02 8.693e+02, threshold=5.739e+02, percent-clipped=2.0 2023-05-19 00:51:45,831 INFO [finetune.py:992] (1/2) Epoch 19, batch 10200, loss[loss=0.1548, simple_loss=0.243, pruned_loss=0.03323, over 12099.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2528, pruned_loss=0.03569, over 2370285.73 frames. ], batch size: 33, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:52:08,035 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.2347, 5.0831, 5.1868, 5.2396, 4.8446, 4.9460, 4.6464, 5.1113], device='cuda:1'), covar=tensor([0.0789, 0.0657, 0.1012, 0.0596, 0.2165, 0.1221, 0.0705, 0.1233], device='cuda:1'), in_proj_covar=tensor([0.0580, 0.0754, 0.0665, 0.0679, 0.0910, 0.0791, 0.0610, 0.0518], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:1') 2023-05-19 00:52:21,102 INFO [finetune.py:992] (1/2) Epoch 19, batch 10250, loss[loss=0.1615, simple_loss=0.2507, pruned_loss=0.03617, over 12360.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.2523, pruned_loss=0.03539, over 2372764.41 frames. ], batch size: 35, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:52:49,377 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.777e+02 2.666e+02 3.051e+02 3.730e+02 8.673e+02, threshold=6.102e+02, percent-clipped=3.0 2023-05-19 00:52:53,042 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.2514, 4.6322, 2.7894, 2.4041, 3.9774, 2.4880, 3.9074, 3.1361], device='cuda:1'), covar=tensor([0.0730, 0.0430, 0.1232, 0.1716, 0.0273, 0.1327, 0.0510, 0.0844], device='cuda:1'), in_proj_covar=tensor([0.0193, 0.0268, 0.0181, 0.0207, 0.0148, 0.0187, 0.0206, 0.0180], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 00:52:56,302 INFO [finetune.py:992] (1/2) Epoch 19, batch 10300, loss[loss=0.1725, simple_loss=0.2659, pruned_loss=0.03956, over 12095.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2519, pruned_loss=0.03546, over 2375802.17 frames. ], batch size: 38, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:53:03,495 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=330524.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:53:08,453 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=330531.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:53:16,233 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.6687, 3.6558, 3.2856, 3.1484, 2.9862, 2.7986, 3.6821, 2.5083], device='cuda:1'), covar=tensor([0.0427, 0.0164, 0.0250, 0.0231, 0.0411, 0.0405, 0.0158, 0.0543], device='cuda:1'), in_proj_covar=tensor([0.0204, 0.0174, 0.0176, 0.0203, 0.0209, 0.0209, 0.0184, 0.0214], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 00:53:31,212 INFO [finetune.py:992] (1/2) Epoch 19, batch 10350, loss[loss=0.1802, simple_loss=0.2749, pruned_loss=0.04269, over 12132.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.2522, pruned_loss=0.03562, over 2377829.74 frames. ], batch size: 38, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:53:36,786 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=330572.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:53:51,417 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=330592.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:54:00,001 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.899e+02 2.557e+02 3.095e+02 3.697e+02 6.958e+02, threshold=6.189e+02, percent-clipped=1.0 2023-05-19 00:54:06,966 INFO [finetune.py:992] (1/2) Epoch 19, batch 10400, loss[loss=0.1904, simple_loss=0.2801, pruned_loss=0.05033, over 12104.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2524, pruned_loss=0.03594, over 2379239.87 frames. ], batch size: 42, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:54:14,785 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.24 vs. limit=2.0 2023-05-19 00:54:31,678 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=330649.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:54:36,116 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.8827, 3.5951, 5.2998, 2.8282, 2.9568, 3.9213, 3.2989, 3.8706], device='cuda:1'), covar=tensor([0.0401, 0.1040, 0.0440, 0.1187, 0.1915, 0.1651, 0.1398, 0.1227], device='cuda:1'), in_proj_covar=tensor([0.0244, 0.0242, 0.0269, 0.0190, 0.0241, 0.0300, 0.0231, 0.0276], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 00:54:42,169 INFO [finetune.py:992] (1/2) Epoch 19, batch 10450, loss[loss=0.145, simple_loss=0.2292, pruned_loss=0.03041, over 12353.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2534, pruned_loss=0.03635, over 2376308.64 frames. ], batch size: 30, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:55:01,902 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=2.24 vs. limit=5.0 2023-05-19 00:55:02,501 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.8975, 4.7563, 4.7046, 4.7364, 4.4084, 4.8870, 4.8784, 5.0391], device='cuda:1'), covar=tensor([0.0223, 0.0166, 0.0205, 0.0417, 0.0785, 0.0323, 0.0144, 0.0193], device='cuda:1'), in_proj_covar=tensor([0.0209, 0.0212, 0.0204, 0.0263, 0.0254, 0.0234, 0.0189, 0.0246], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:1') 2023-05-19 00:55:05,179 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=330697.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:55:10,060 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.918e+02 2.534e+02 3.011e+02 3.548e+02 6.223e+02, threshold=6.023e+02, percent-clipped=1.0 2023-05-19 00:55:17,032 INFO [finetune.py:992] (1/2) Epoch 19, batch 10500, loss[loss=0.1375, simple_loss=0.2261, pruned_loss=0.02444, over 12189.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2528, pruned_loss=0.03627, over 2371098.85 frames. ], batch size: 29, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:55:51,343 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.8266, 2.8881, 4.7098, 4.7920, 2.9654, 2.6568, 3.1030, 2.2848], device='cuda:1'), covar=tensor([0.1700, 0.3009, 0.0432, 0.0445, 0.1360, 0.2669, 0.2903, 0.4351], device='cuda:1'), in_proj_covar=tensor([0.0319, 0.0404, 0.0290, 0.0317, 0.0288, 0.0333, 0.0415, 0.0391], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-19 00:55:52,437 INFO [finetune.py:992] (1/2) Epoch 19, batch 10550, loss[loss=0.1499, simple_loss=0.2377, pruned_loss=0.03108, over 11986.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2526, pruned_loss=0.03617, over 2364725.28 frames. ], batch size: 28, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:56:19,401 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.1859, 2.4069, 3.7136, 3.1253, 3.5696, 3.2448, 2.5844, 3.5491], device='cuda:1'), covar=tensor([0.0169, 0.0512, 0.0190, 0.0291, 0.0181, 0.0230, 0.0450, 0.0206], device='cuda:1'), in_proj_covar=tensor([0.0195, 0.0217, 0.0207, 0.0201, 0.0233, 0.0181, 0.0210, 0.0206], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 00:56:20,583 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.952e+02 2.559e+02 2.899e+02 3.575e+02 7.514e+02, threshold=5.797e+02, percent-clipped=2.0 2023-05-19 00:56:27,323 INFO [finetune.py:992] (1/2) Epoch 19, batch 10600, loss[loss=0.1397, simple_loss=0.2259, pruned_loss=0.02675, over 12244.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.2533, pruned_loss=0.03644, over 2365306.47 frames. ], batch size: 32, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:57:01,429 INFO [finetune.py:992] (1/2) Epoch 19, batch 10650, loss[loss=0.1726, simple_loss=0.2646, pruned_loss=0.04032, over 12241.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2529, pruned_loss=0.03624, over 2361951.03 frames. ], batch size: 32, lr: 3.13e-03, grad_scale: 32.0 2023-05-19 00:57:18,533 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=330887.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 00:57:24,315 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.4170, 4.7004, 4.1507, 5.1088, 4.6032, 3.1206, 4.0803, 3.1113], device='cuda:1'), covar=tensor([0.0822, 0.0914, 0.1472, 0.0513, 0.1088, 0.1606, 0.1332, 0.3545], device='cuda:1'), in_proj_covar=tensor([0.0315, 0.0389, 0.0369, 0.0350, 0.0379, 0.0281, 0.0355, 0.0373], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 00:57:31,058 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.980e+02 2.601e+02 3.010e+02 3.765e+02 7.025e+02, threshold=6.019e+02, percent-clipped=3.0 2023-05-19 00:57:37,836 INFO [finetune.py:992] (1/2) Epoch 19, batch 10700, loss[loss=0.1574, simple_loss=0.2465, pruned_loss=0.03418, over 12140.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2532, pruned_loss=0.03629, over 2366388.19 frames. ], batch size: 29, lr: 3.13e-03, grad_scale: 16.0 2023-05-19 00:57:51,259 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.1411, 4.9930, 5.1243, 5.1674, 4.7666, 4.8665, 4.5719, 5.0243], device='cuda:1'), covar=tensor([0.0722, 0.0636, 0.0881, 0.0576, 0.2106, 0.1324, 0.0664, 0.1311], device='cuda:1'), in_proj_covar=tensor([0.0586, 0.0763, 0.0668, 0.0683, 0.0919, 0.0799, 0.0615, 0.0524], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:1') 2023-05-19 00:58:06,009 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.2662, 4.6336, 2.9608, 2.3825, 4.0866, 2.3783, 3.8885, 3.2250], device='cuda:1'), covar=tensor([0.0776, 0.0540, 0.1143, 0.2055, 0.0402, 0.1718, 0.0553, 0.0900], device='cuda:1'), in_proj_covar=tensor([0.0195, 0.0270, 0.0183, 0.0209, 0.0150, 0.0189, 0.0208, 0.0182], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 00:58:12,727 INFO [finetune.py:992] (1/2) Epoch 19, batch 10750, loss[loss=0.1545, simple_loss=0.2516, pruned_loss=0.02875, over 12107.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2523, pruned_loss=0.03614, over 2368879.36 frames. ], batch size: 33, lr: 3.13e-03, grad_scale: 16.0 2023-05-19 00:58:13,654 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.2014, 2.6151, 3.7753, 3.1205, 3.6344, 3.2756, 2.6998, 3.7337], device='cuda:1'), covar=tensor([0.0157, 0.0422, 0.0165, 0.0294, 0.0188, 0.0210, 0.0385, 0.0142], device='cuda:1'), in_proj_covar=tensor([0.0197, 0.0219, 0.0209, 0.0203, 0.0236, 0.0183, 0.0212, 0.0209], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 00:58:41,602 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.046e+02 2.665e+02 3.088e+02 3.452e+02 7.597e+02, threshold=6.175e+02, percent-clipped=1.0 2023-05-19 00:58:42,609 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.26 vs. limit=2.0 2023-05-19 00:58:47,182 INFO [finetune.py:992] (1/2) Epoch 19, batch 10800, loss[loss=0.1824, simple_loss=0.2778, pruned_loss=0.04356, over 11168.00 frames. ], tot_loss[loss=0.1641, simple_loss=0.2543, pruned_loss=0.03698, over 2357438.63 frames. ], batch size: 55, lr: 3.13e-03, grad_scale: 8.0 2023-05-19 00:58:48,831 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.4788, 2.4052, 3.1379, 4.2673, 2.3086, 4.2866, 4.4156, 4.4682], device='cuda:1'), covar=tensor([0.0161, 0.1551, 0.0612, 0.0222, 0.1537, 0.0314, 0.0179, 0.0130], device='cuda:1'), in_proj_covar=tensor([0.0128, 0.0208, 0.0188, 0.0127, 0.0193, 0.0188, 0.0186, 0.0132], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:1') 2023-05-19 00:58:49,811 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.39 vs. limit=2.0 2023-05-19 00:58:52,244 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.4034, 4.7605, 2.9445, 2.8205, 4.0299, 2.6521, 3.9797, 3.3237], device='cuda:1'), covar=tensor([0.0690, 0.0476, 0.1253, 0.1476, 0.0353, 0.1398, 0.0477, 0.0814], device='cuda:1'), in_proj_covar=tensor([0.0195, 0.0269, 0.0183, 0.0209, 0.0150, 0.0189, 0.0208, 0.0181], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 00:58:58,125 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.64 vs. limit=2.0 2023-05-19 00:59:22,999 INFO [finetune.py:992] (1/2) Epoch 19, batch 10850, loss[loss=0.2297, simple_loss=0.3071, pruned_loss=0.07616, over 7876.00 frames. ], tot_loss[loss=0.1647, simple_loss=0.2546, pruned_loss=0.03735, over 2346418.93 frames. ], batch size: 99, lr: 3.13e-03, grad_scale: 8.0 2023-05-19 00:59:52,242 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.812e+02 2.639e+02 3.104e+02 3.812e+02 6.693e+02, threshold=6.207e+02, percent-clipped=2.0 2023-05-19 00:59:57,894 INFO [finetune.py:992] (1/2) Epoch 19, batch 10900, loss[loss=0.179, simple_loss=0.2689, pruned_loss=0.04451, over 10399.00 frames. ], tot_loss[loss=0.1648, simple_loss=0.2547, pruned_loss=0.03742, over 2350488.89 frames. ], batch size: 68, lr: 3.13e-03, grad_scale: 8.0 2023-05-19 01:00:28,051 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.8679, 2.1337, 3.6018, 2.9387, 3.4622, 2.9051, 2.3600, 3.5175], device='cuda:1'), covar=tensor([0.0212, 0.0558, 0.0201, 0.0303, 0.0171, 0.0289, 0.0513, 0.0180], device='cuda:1'), in_proj_covar=tensor([0.0196, 0.0218, 0.0207, 0.0201, 0.0235, 0.0181, 0.0210, 0.0207], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:00:32,193 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.4163, 4.8085, 3.0995, 2.7595, 4.1576, 2.7006, 4.0083, 3.4443], device='cuda:1'), covar=tensor([0.0754, 0.0548, 0.1144, 0.1586, 0.0338, 0.1390, 0.0595, 0.0782], device='cuda:1'), in_proj_covar=tensor([0.0194, 0.0268, 0.0182, 0.0208, 0.0149, 0.0188, 0.0207, 0.0181], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 01:00:32,687 INFO [finetune.py:992] (1/2) Epoch 19, batch 10950, loss[loss=0.1603, simple_loss=0.2553, pruned_loss=0.03263, over 12200.00 frames. ], tot_loss[loss=0.1661, simple_loss=0.2557, pruned_loss=0.03828, over 2342844.80 frames. ], batch size: 31, lr: 3.13e-03, grad_scale: 8.0 2023-05-19 01:00:48,535 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=331187.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:01:02,268 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.994e+02 2.828e+02 3.249e+02 3.909e+02 9.795e+02, threshold=6.498e+02, percent-clipped=5.0 2023-05-19 01:01:08,071 INFO [finetune.py:992] (1/2) Epoch 19, batch 11000, loss[loss=0.1441, simple_loss=0.2245, pruned_loss=0.0319, over 12374.00 frames. ], tot_loss[loss=0.167, simple_loss=0.2567, pruned_loss=0.03869, over 2329869.58 frames. ], batch size: 30, lr: 3.13e-03, grad_scale: 8.0 2023-05-19 01:01:14,350 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.5862, 4.4985, 4.5891, 4.6193, 4.3413, 4.3954, 4.2030, 4.5212], device='cuda:1'), covar=tensor([0.0945, 0.0624, 0.0991, 0.0611, 0.1722, 0.1334, 0.0674, 0.1098], device='cuda:1'), in_proj_covar=tensor([0.0585, 0.0762, 0.0670, 0.0684, 0.0915, 0.0796, 0.0615, 0.0522], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:1') 2023-05-19 01:01:22,401 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=331235.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:01:34,555 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.64 vs. limit=2.0 2023-05-19 01:01:42,320 INFO [finetune.py:992] (1/2) Epoch 19, batch 11050, loss[loss=0.1706, simple_loss=0.2572, pruned_loss=0.04204, over 12139.00 frames. ], tot_loss[loss=0.1702, simple_loss=0.2605, pruned_loss=0.03994, over 2297598.10 frames. ], batch size: 39, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:01:58,449 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.9881, 3.7663, 5.4757, 3.0001, 3.2020, 4.2539, 3.6724, 4.2009], device='cuda:1'), covar=tensor([0.0455, 0.1139, 0.0306, 0.1201, 0.1837, 0.1331, 0.1234, 0.1068], device='cuda:1'), in_proj_covar=tensor([0.0241, 0.0240, 0.0266, 0.0188, 0.0239, 0.0296, 0.0229, 0.0274], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 01:02:09,954 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=331302.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:02:10,631 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.1154, 2.4467, 3.6997, 3.0788, 3.5422, 3.2260, 2.6433, 3.6264], device='cuda:1'), covar=tensor([0.0141, 0.0429, 0.0164, 0.0267, 0.0154, 0.0201, 0.0404, 0.0144], device='cuda:1'), in_proj_covar=tensor([0.0195, 0.0218, 0.0207, 0.0201, 0.0234, 0.0181, 0.0210, 0.0206], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:02:12,320 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.196e+02 2.758e+02 3.308e+02 4.257e+02 7.925e+02, threshold=6.616e+02, percent-clipped=3.0 2023-05-19 01:02:17,846 INFO [finetune.py:992] (1/2) Epoch 19, batch 11100, loss[loss=0.2314, simple_loss=0.3021, pruned_loss=0.0803, over 8103.00 frames. ], tot_loss[loss=0.174, simple_loss=0.2636, pruned_loss=0.04223, over 2239377.72 frames. ], batch size: 99, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:02:26,432 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.6027, 2.4108, 3.4215, 3.5005, 3.5443, 3.6760, 3.4640, 2.5073], device='cuda:1'), covar=tensor([0.0083, 0.0445, 0.0166, 0.0091, 0.0126, 0.0119, 0.0132, 0.0561], device='cuda:1'), in_proj_covar=tensor([0.0092, 0.0124, 0.0106, 0.0084, 0.0106, 0.0119, 0.0106, 0.0139], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 01:02:52,113 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=331363.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:02:52,595 INFO [finetune.py:992] (1/2) Epoch 19, batch 11150, loss[loss=0.2859, simple_loss=0.3587, pruned_loss=0.1066, over 6789.00 frames. ], tot_loss[loss=0.1803, simple_loss=0.269, pruned_loss=0.04576, over 2183534.06 frames. ], batch size: 98, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:03:19,165 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.9753, 2.2976, 3.2034, 2.8180, 3.1135, 3.0844, 2.3145, 3.2491], device='cuda:1'), covar=tensor([0.0127, 0.0408, 0.0107, 0.0206, 0.0176, 0.0156, 0.0381, 0.0125], device='cuda:1'), in_proj_covar=tensor([0.0193, 0.0215, 0.0204, 0.0199, 0.0232, 0.0179, 0.0208, 0.0204], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:03:21,548 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.988e+02 3.365e+02 3.844e+02 4.601e+02 6.844e+02, threshold=7.689e+02, percent-clipped=1.0 2023-05-19 01:03:26,715 INFO [finetune.py:992] (1/2) Epoch 19, batch 11200, loss[loss=0.3156, simple_loss=0.3767, pruned_loss=0.1273, over 6681.00 frames. ], tot_loss[loss=0.1875, simple_loss=0.2754, pruned_loss=0.04978, over 2138709.92 frames. ], batch size: 102, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:03:29,880 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.59 vs. limit=2.0 2023-05-19 01:03:45,314 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.6773, 3.3090, 3.4629, 3.6889, 3.4105, 3.8301, 3.8086, 3.7832], device='cuda:1'), covar=tensor([0.0261, 0.0235, 0.0189, 0.0444, 0.0599, 0.0289, 0.0190, 0.0294], device='cuda:1'), in_proj_covar=tensor([0.0205, 0.0207, 0.0200, 0.0257, 0.0249, 0.0228, 0.0185, 0.0242], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:1') 2023-05-19 01:04:01,268 INFO [finetune.py:992] (1/2) Epoch 19, batch 11250, loss[loss=0.2263, simple_loss=0.3084, pruned_loss=0.07208, over 6944.00 frames. ], tot_loss[loss=0.1951, simple_loss=0.282, pruned_loss=0.05403, over 2083507.42 frames. ], batch size: 97, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:04:12,214 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.2262, 1.9766, 2.1592, 2.0010, 2.2535, 2.3332, 1.8730, 2.2537], device='cuda:1'), covar=tensor([0.0115, 0.0306, 0.0123, 0.0195, 0.0152, 0.0155, 0.0290, 0.0126], device='cuda:1'), in_proj_covar=tensor([0.0191, 0.0214, 0.0203, 0.0197, 0.0230, 0.0177, 0.0206, 0.0203], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:04:29,087 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.203e+02 3.419e+02 4.236e+02 5.254e+02 1.140e+03, threshold=8.472e+02, percent-clipped=4.0 2023-05-19 01:04:35,319 INFO [finetune.py:992] (1/2) Epoch 19, batch 11300, loss[loss=0.1917, simple_loss=0.2874, pruned_loss=0.048, over 10358.00 frames. ], tot_loss[loss=0.1995, simple_loss=0.2865, pruned_loss=0.05628, over 2043457.87 frames. ], batch size: 68, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:04:47,312 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.4626, 3.2662, 3.1324, 3.0158, 2.8589, 2.7628, 3.0031, 2.2039], device='cuda:1'), covar=tensor([0.0427, 0.0149, 0.0164, 0.0219, 0.0305, 0.0325, 0.0207, 0.0586], device='cuda:1'), in_proj_covar=tensor([0.0203, 0.0172, 0.0175, 0.0203, 0.0209, 0.0209, 0.0183, 0.0212], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:05:09,050 INFO [finetune.py:992] (1/2) Epoch 19, batch 11350, loss[loss=0.2112, simple_loss=0.2987, pruned_loss=0.06181, over 10259.00 frames. ], tot_loss[loss=0.2046, simple_loss=0.2909, pruned_loss=0.05916, over 1985615.85 frames. ], batch size: 68, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:05:17,672 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.0216, 2.2585, 3.0472, 3.9792, 2.2259, 4.0656, 3.8894, 3.9947], device='cuda:1'), covar=tensor([0.0172, 0.1391, 0.0518, 0.0165, 0.1419, 0.0202, 0.0294, 0.0155], device='cuda:1'), in_proj_covar=tensor([0.0126, 0.0206, 0.0186, 0.0125, 0.0191, 0.0185, 0.0184, 0.0131], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:05:18,572 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.30 vs. limit=2.0 2023-05-19 01:05:25,518 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.8462, 2.5817, 3.5436, 3.6011, 2.9491, 2.6758, 2.6634, 2.3906], device='cuda:1'), covar=tensor([0.1350, 0.2523, 0.0663, 0.0528, 0.1022, 0.2180, 0.2617, 0.3727], device='cuda:1'), in_proj_covar=tensor([0.0315, 0.0396, 0.0285, 0.0311, 0.0283, 0.0328, 0.0409, 0.0385], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:05:37,790 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.593e+02 3.501e+02 4.003e+02 4.850e+02 7.916e+02, threshold=8.006e+02, percent-clipped=0.0 2023-05-19 01:05:39,155 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.5489, 5.2219, 4.8114, 4.8578, 5.3284, 4.7111, 4.7950, 4.7617], device='cuda:1'), covar=tensor([0.1283, 0.0958, 0.1258, 0.1560, 0.0812, 0.1907, 0.1734, 0.1027], device='cuda:1'), in_proj_covar=tensor([0.0360, 0.0505, 0.0412, 0.0452, 0.0466, 0.0448, 0.0409, 0.0391], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 01:05:43,083 INFO [finetune.py:992] (1/2) Epoch 19, batch 11400, loss[loss=0.1611, simple_loss=0.2613, pruned_loss=0.03044, over 12079.00 frames. ], tot_loss[loss=0.209, simple_loss=0.2945, pruned_loss=0.0617, over 1943647.71 frames. ], batch size: 32, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:06:10,023 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=331654.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:06:13,265 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=331658.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:06:17,084 INFO [finetune.py:992] (1/2) Epoch 19, batch 11450, loss[loss=0.2606, simple_loss=0.3256, pruned_loss=0.09783, over 6886.00 frames. ], tot_loss[loss=0.2133, simple_loss=0.2978, pruned_loss=0.0644, over 1901146.35 frames. ], batch size: 98, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:06:43,965 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.608e+02 3.484e+02 4.007e+02 4.586e+02 8.783e+02, threshold=8.014e+02, percent-clipped=3.0 2023-05-19 01:06:50,106 INFO [finetune.py:992] (1/2) Epoch 19, batch 11500, loss[loss=0.256, simple_loss=0.3204, pruned_loss=0.09582, over 6772.00 frames. ], tot_loss[loss=0.2175, simple_loss=0.3009, pruned_loss=0.06707, over 1860344.81 frames. ], batch size: 98, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:06:50,940 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=331715.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:07:02,587 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.7322, 3.7453, 3.7206, 3.8427, 3.6786, 3.7079, 3.5630, 3.7235], device='cuda:1'), covar=tensor([0.1466, 0.0721, 0.1550, 0.0705, 0.1489, 0.1182, 0.0641, 0.1111], device='cuda:1'), in_proj_covar=tensor([0.0553, 0.0726, 0.0634, 0.0645, 0.0864, 0.0758, 0.0582, 0.0499], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0003, 0.0003], device='cuda:1') 2023-05-19 01:07:03,900 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=331734.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:07:06,704 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.15 vs. limit=2.0 2023-05-19 01:07:23,458 INFO [finetune.py:992] (1/2) Epoch 19, batch 11550, loss[loss=0.2764, simple_loss=0.3265, pruned_loss=0.1132, over 6823.00 frames. ], tot_loss[loss=0.2202, simple_loss=0.3028, pruned_loss=0.06882, over 1813759.05 frames. ], batch size: 98, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:07:35,421 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.4310, 2.9210, 3.7387, 2.5184, 2.6573, 3.2645, 2.9557, 3.2403], device='cuda:1'), covar=tensor([0.0600, 0.1252, 0.0342, 0.1254, 0.1785, 0.1155, 0.1379, 0.1050], device='cuda:1'), in_proj_covar=tensor([0.0235, 0.0234, 0.0257, 0.0184, 0.0232, 0.0287, 0.0224, 0.0264], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 01:07:45,143 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=331795.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:07:45,305 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.87 vs. limit=2.0 2023-05-19 01:07:47,540 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=331799.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:07:52,110 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.317e+02 3.484e+02 3.907e+02 4.686e+02 6.736e+02, threshold=7.814e+02, percent-clipped=0.0 2023-05-19 01:07:57,276 INFO [finetune.py:992] (1/2) Epoch 19, batch 11600, loss[loss=0.276, simple_loss=0.3353, pruned_loss=0.1083, over 7061.00 frames. ], tot_loss[loss=0.2226, simple_loss=0.3044, pruned_loss=0.07039, over 1792243.04 frames. ], batch size: 98, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:07:58,173 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.0906, 4.6505, 3.8551, 4.9949, 4.2357, 2.5351, 3.9598, 2.7995], device='cuda:1'), covar=tensor([0.0967, 0.0725, 0.1788, 0.0441, 0.1574, 0.2350, 0.1482, 0.4065], device='cuda:1'), in_proj_covar=tensor([0.0308, 0.0376, 0.0357, 0.0335, 0.0367, 0.0275, 0.0346, 0.0363], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:08:08,854 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.6609, 4.5490, 4.6561, 4.6839, 4.4223, 4.4709, 4.2703, 4.4830], device='cuda:1'), covar=tensor([0.0872, 0.0677, 0.0890, 0.0614, 0.1725, 0.1287, 0.0618, 0.1403], device='cuda:1'), in_proj_covar=tensor([0.0546, 0.0719, 0.0625, 0.0639, 0.0853, 0.0749, 0.0576, 0.0493], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0003, 0.0003], device='cuda:1') 2023-05-19 01:08:15,202 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.32 vs. limit=5.0 2023-05-19 01:08:18,238 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.4500, 3.1516, 3.0678, 3.3110, 2.5398, 3.1695, 2.6555, 2.7879], device='cuda:1'), covar=tensor([0.1456, 0.0866, 0.0835, 0.0577, 0.1047, 0.0765, 0.1616, 0.0626], device='cuda:1'), in_proj_covar=tensor([0.0231, 0.0272, 0.0298, 0.0359, 0.0246, 0.0245, 0.0264, 0.0368], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-19 01:08:29,529 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=331860.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:08:32,140 INFO [finetune.py:992] (1/2) Epoch 19, batch 11650, loss[loss=0.2336, simple_loss=0.3034, pruned_loss=0.08186, over 6990.00 frames. ], tot_loss[loss=0.2224, simple_loss=0.3039, pruned_loss=0.07041, over 1783358.78 frames. ], batch size: 98, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:09:00,615 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.560e+02 3.351e+02 3.903e+02 4.645e+02 7.287e+02, threshold=7.805e+02, percent-clipped=0.0 2023-05-19 01:09:06,511 INFO [finetune.py:992] (1/2) Epoch 19, batch 11700, loss[loss=0.244, simple_loss=0.3083, pruned_loss=0.08982, over 6637.00 frames. ], tot_loss[loss=0.2226, simple_loss=0.3034, pruned_loss=0.07092, over 1749093.90 frames. ], batch size: 97, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:09:08,599 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=331917.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:09:17,155 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.7327, 4.1261, 3.6260, 4.4936, 3.9258, 2.6869, 3.7555, 2.7596], device='cuda:1'), covar=tensor([0.1189, 0.1008, 0.1819, 0.0563, 0.1744, 0.2285, 0.1432, 0.4156], device='cuda:1'), in_proj_covar=tensor([0.0306, 0.0372, 0.0355, 0.0332, 0.0364, 0.0273, 0.0343, 0.0361], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:09:19,174 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.24 vs. limit=2.0 2023-05-19 01:09:35,023 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.31 vs. limit=2.0 2023-05-19 01:09:35,990 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=331958.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:09:39,691 INFO [finetune.py:992] (1/2) Epoch 19, batch 11750, loss[loss=0.2774, simple_loss=0.336, pruned_loss=0.1094, over 6419.00 frames. ], tot_loss[loss=0.2232, simple_loss=0.3034, pruned_loss=0.07145, over 1723290.13 frames. ], batch size: 98, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:09:49,510 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=331978.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:09:53,345 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=331984.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:10:11,325 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.185e+02 3.441e+02 4.105e+02 4.910e+02 1.296e+03, threshold=8.210e+02, percent-clipped=2.0 2023-05-19 01:10:11,420 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=332006.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:10:14,075 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=332010.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:10:14,788 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.7948, 3.6994, 3.7863, 3.5761, 3.7016, 3.5532, 3.8027, 3.4888], device='cuda:1'), covar=tensor([0.0489, 0.0417, 0.0443, 0.0276, 0.0451, 0.0403, 0.0379, 0.1616], device='cuda:1'), in_proj_covar=tensor([0.0272, 0.0273, 0.0296, 0.0267, 0.0269, 0.0268, 0.0245, 0.0218], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 01:10:16,501 INFO [finetune.py:992] (1/2) Epoch 19, batch 11800, loss[loss=0.2644, simple_loss=0.3285, pruned_loss=0.1002, over 6331.00 frames. ], tot_loss[loss=0.2259, simple_loss=0.3056, pruned_loss=0.07306, over 1718353.18 frames. ], batch size: 98, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:10:37,689 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=332045.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:10:49,844 INFO [finetune.py:992] (1/2) Epoch 19, batch 11850, loss[loss=0.1996, simple_loss=0.2895, pruned_loss=0.05484, over 11611.00 frames. ], tot_loss[loss=0.2275, simple_loss=0.3073, pruned_loss=0.07383, over 1705639.60 frames. ], batch size: 48, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:11:01,163 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.9188, 2.2658, 2.5698, 2.9412, 2.2319, 3.0589, 2.9678, 3.0769], device='cuda:1'), covar=tensor([0.0213, 0.1143, 0.0544, 0.0242, 0.1151, 0.0359, 0.0357, 0.0200], device='cuda:1'), in_proj_covar=tensor([0.0123, 0.0204, 0.0183, 0.0123, 0.0188, 0.0181, 0.0179, 0.0128], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:11:07,973 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=332090.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:11:18,522 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.422e+02 3.367e+02 4.107e+02 5.003e+02 1.129e+03, threshold=8.213e+02, percent-clipped=1.0 2023-05-19 01:11:23,663 INFO [finetune.py:992] (1/2) Epoch 19, batch 11900, loss[loss=0.2281, simple_loss=0.3146, pruned_loss=0.07074, over 11176.00 frames. ], tot_loss[loss=0.2264, simple_loss=0.3071, pruned_loss=0.07288, over 1699314.52 frames. ], batch size: 55, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:11:34,893 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.5208, 4.5150, 4.3856, 4.0273, 4.1323, 4.4986, 4.2769, 4.0957], device='cuda:1'), covar=tensor([0.0968, 0.1018, 0.0705, 0.1455, 0.2320, 0.0815, 0.1418, 0.1142], device='cuda:1'), in_proj_covar=tensor([0.0635, 0.0571, 0.0522, 0.0638, 0.0429, 0.0736, 0.0770, 0.0565], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002], device='cuda:1') 2023-05-19 01:11:48,563 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=332150.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:11:51,753 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=332155.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:11:57,585 INFO [finetune.py:992] (1/2) Epoch 19, batch 11950, loss[loss=0.1697, simple_loss=0.256, pruned_loss=0.04169, over 6939.00 frames. ], tot_loss[loss=0.2226, simple_loss=0.3042, pruned_loss=0.07053, over 1678032.56 frames. ], batch size: 98, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:12:26,659 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.680e+02 2.981e+02 3.431e+02 3.993e+02 6.682e+02, threshold=6.862e+02, percent-clipped=0.0 2023-05-19 01:12:30,268 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=332211.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:12:32,090 INFO [finetune.py:992] (1/2) Epoch 19, batch 12000, loss[loss=0.1888, simple_loss=0.2634, pruned_loss=0.05711, over 6694.00 frames. ], tot_loss[loss=0.2167, simple_loss=0.2994, pruned_loss=0.06699, over 1683009.07 frames. ], batch size: 98, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:12:32,090 INFO [finetune.py:1017] (1/2) Computing validation loss 2023-05-19 01:12:36,869 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.2445, 4.8656, 5.2221, 4.8001, 4.9148, 4.9320, 5.2207, 5.1634], device='cuda:1'), covar=tensor([0.0341, 0.0402, 0.0267, 0.0241, 0.0540, 0.0309, 0.0264, 0.0208], device='cuda:1'), in_proj_covar=tensor([0.0268, 0.0269, 0.0292, 0.0264, 0.0265, 0.0263, 0.0241, 0.0215], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 01:12:38,172 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.3723, 5.2814, 5.2731, 4.8235, 4.9474, 5.3688, 5.0170, 5.0475], device='cuda:1'), covar=tensor([0.0719, 0.1005, 0.0468, 0.1429, 0.0346, 0.0514, 0.1095, 0.0556], device='cuda:1'), in_proj_covar=tensor([0.0630, 0.0566, 0.0518, 0.0633, 0.0426, 0.0729, 0.0764, 0.0560], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002], device='cuda:1') 2023-05-19 01:12:44,854 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([1.9234, 3.2961, 3.3283, 3.7300, 2.4059, 3.2105, 2.2877, 3.1935], device='cuda:1'), covar=tensor([0.2367, 0.1163, 0.1156, 0.0634, 0.1792, 0.1045, 0.2625, 0.1040], device='cuda:1'), in_proj_covar=tensor([0.0230, 0.0270, 0.0295, 0.0355, 0.0244, 0.0244, 0.0263, 0.0365], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-19 01:12:45,299 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.5900, 3.5359, 3.5820, 3.4735, 3.5400, 3.4114, 3.6128, 3.6318], device='cuda:1'), covar=tensor([0.0514, 0.0378, 0.0390, 0.0270, 0.0477, 0.0369, 0.0385, 0.0392], device='cuda:1'), in_proj_covar=tensor([0.0268, 0.0269, 0.0292, 0.0264, 0.0265, 0.0263, 0.0241, 0.0215], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 01:12:48,261 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.4220, 2.4436, 3.8679, 3.2495, 3.5996, 3.3966, 2.7607, 3.7625], device='cuda:1'), covar=tensor([0.0106, 0.0438, 0.0050, 0.0205, 0.0134, 0.0131, 0.0336, 0.0080], device='cuda:1'), in_proj_covar=tensor([0.0181, 0.0205, 0.0191, 0.0187, 0.0217, 0.0167, 0.0197, 0.0192], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:12:48,494 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.3411, 3.0070, 4.9923, 2.4328, 2.5770, 3.6579, 3.0472, 3.5460], device='cuda:1'), covar=tensor([0.0591, 0.1631, 0.0087, 0.1489, 0.2410, 0.1673, 0.1772, 0.1236], device='cuda:1'), in_proj_covar=tensor([0.0229, 0.0229, 0.0250, 0.0181, 0.0228, 0.0280, 0.0219, 0.0258], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:12:49,556 INFO [finetune.py:1026] (1/2) Epoch 19, validation: loss=0.2852, simple_loss=0.36, pruned_loss=0.1052, over 1020973.00 frames. 2023-05-19 01:12:49,557 INFO [finetune.py:1027] (1/2) Maximum memory allocated so far is 12856MB 2023-05-19 01:12:52,227 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.6442, 4.5393, 4.6477, 4.6733, 4.4098, 4.4605, 4.3388, 4.5092], device='cuda:1'), covar=tensor([0.0877, 0.0704, 0.0894, 0.0644, 0.1766, 0.1190, 0.0594, 0.1346], device='cuda:1'), in_proj_covar=tensor([0.0536, 0.0703, 0.0614, 0.0625, 0.0837, 0.0735, 0.0566, 0.0486], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0003, 0.0003], device='cuda:1') 2023-05-19 01:13:23,351 INFO [finetune.py:992] (1/2) Epoch 19, batch 12050, loss[loss=0.2047, simple_loss=0.2973, pruned_loss=0.05603, over 7635.00 frames. ], tot_loss[loss=0.2105, simple_loss=0.2947, pruned_loss=0.06314, over 1709588.96 frames. ], batch size: 98, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:13:28,750 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.6198, 3.6720, 3.4610, 3.2351, 3.0194, 2.8717, 3.4322, 2.3906], device='cuda:1'), covar=tensor([0.0427, 0.0155, 0.0169, 0.0231, 0.0356, 0.0398, 0.0196, 0.0562], device='cuda:1'), in_proj_covar=tensor([0.0195, 0.0164, 0.0167, 0.0194, 0.0200, 0.0199, 0.0175, 0.0203], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:13:29,370 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=332273.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:13:48,246 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.32 vs. limit=2.0 2023-05-19 01:13:49,765 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.177e+02 2.974e+02 3.380e+02 4.027e+02 9.522e+02, threshold=6.760e+02, percent-clipped=1.0 2023-05-19 01:13:51,282 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.38 vs. limit=5.0 2023-05-19 01:13:52,334 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=332310.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:13:54,719 INFO [finetune.py:992] (1/2) Epoch 19, batch 12100, loss[loss=0.1792, simple_loss=0.2806, pruned_loss=0.03891, over 10356.00 frames. ], tot_loss[loss=0.2091, simple_loss=0.2937, pruned_loss=0.06226, over 1711595.77 frames. ], batch size: 68, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:13:55,571 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.8475, 2.5169, 3.5333, 3.5532, 2.8529, 2.6633, 2.6753, 2.4296], device='cuda:1'), covar=tensor([0.1356, 0.2737, 0.0687, 0.0546, 0.1066, 0.2407, 0.2912, 0.3912], device='cuda:1'), in_proj_covar=tensor([0.0306, 0.0387, 0.0277, 0.0300, 0.0277, 0.0320, 0.0401, 0.0377], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:14:11,458 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=332340.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:14:22,549 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=332358.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:14:26,271 INFO [finetune.py:992] (1/2) Epoch 19, batch 12150, loss[loss=0.2462, simple_loss=0.317, pruned_loss=0.08768, over 7092.00 frames. ], tot_loss[loss=0.21, simple_loss=0.2947, pruned_loss=0.06266, over 1711338.96 frames. ], batch size: 98, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:14:38,801 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.4527, 3.0335, 3.7249, 2.3489, 2.5729, 3.1138, 2.9282, 3.1265], device='cuda:1'), covar=tensor([0.0595, 0.1244, 0.0355, 0.1476, 0.2219, 0.1624, 0.1501, 0.1335], device='cuda:1'), in_proj_covar=tensor([0.0228, 0.0230, 0.0250, 0.0181, 0.0229, 0.0281, 0.0220, 0.0258], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:14:43,032 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=332390.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:14:53,115 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.019e+02 3.104e+02 3.637e+02 4.356e+02 7.148e+02, threshold=7.274e+02, percent-clipped=3.0 2023-05-19 01:14:57,953 INFO [finetune.py:992] (1/2) Epoch 19, batch 12200, loss[loss=0.2475, simple_loss=0.3162, pruned_loss=0.08937, over 6588.00 frames. ], tot_loss[loss=0.2116, simple_loss=0.2957, pruned_loss=0.06377, over 1685590.88 frames. ], batch size: 98, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:15:03,543 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.7877, 3.0681, 2.4286, 2.3223, 2.8167, 2.4038, 2.9329, 2.6872], device='cuda:1'), covar=tensor([0.0672, 0.0692, 0.1073, 0.1406, 0.0333, 0.1139, 0.0605, 0.0822], device='cuda:1'), in_proj_covar=tensor([0.0184, 0.0248, 0.0175, 0.0199, 0.0140, 0.0180, 0.0193, 0.0173], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 01:15:12,354 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=332438.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:15:38,567 INFO [finetune.py:992] (1/2) Epoch 20, batch 0, loss[loss=0.1978, simple_loss=0.2942, pruned_loss=0.05072, over 10614.00 frames. ], tot_loss[loss=0.1978, simple_loss=0.2942, pruned_loss=0.05072, over 10614.00 frames. ], batch size: 69, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:15:38,567 INFO [finetune.py:1017] (1/2) Computing validation loss 2023-05-19 01:15:53,914 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.6790, 4.3640, 4.6730, 4.2930, 4.4489, 4.3603, 4.6633, 4.7062], device='cuda:1'), covar=tensor([0.0351, 0.0420, 0.0309, 0.0274, 0.0471, 0.0378, 0.0260, 0.0203], device='cuda:1'), in_proj_covar=tensor([0.0264, 0.0266, 0.0287, 0.0261, 0.0262, 0.0259, 0.0238, 0.0212], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 01:15:54,734 INFO [finetune.py:1026] (1/2) Epoch 20, validation: loss=0.2858, simple_loss=0.3601, pruned_loss=0.1058, over 1020973.00 frames. 2023-05-19 01:15:54,734 INFO [finetune.py:1027] (1/2) Maximum memory allocated so far is 12856MB 2023-05-19 01:15:59,718 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=332455.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:16:22,863 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.17 vs. limit=2.0 2023-05-19 01:16:30,241 INFO [finetune.py:992] (1/2) Epoch 20, batch 50, loss[loss=0.1813, simple_loss=0.2713, pruned_loss=0.04561, over 12137.00 frames. ], tot_loss[loss=0.1713, simple_loss=0.263, pruned_loss=0.03981, over 536270.57 frames. ], batch size: 39, lr: 3.12e-03, grad_scale: 8.0 2023-05-19 01:16:33,740 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=332503.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:16:35,797 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.047e+02 2.897e+02 3.544e+02 4.170e+02 7.708e+02, threshold=7.088e+02, percent-clipped=2.0 2023-05-19 01:16:35,882 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=332506.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:16:48,396 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=332524.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 01:17:04,755 INFO [finetune.py:992] (1/2) Epoch 20, batch 100, loss[loss=0.1461, simple_loss=0.2283, pruned_loss=0.032, over 12348.00 frames. ], tot_loss[loss=0.1703, simple_loss=0.2622, pruned_loss=0.03921, over 941562.13 frames. ], batch size: 30, lr: 3.11e-03, grad_scale: 8.0 2023-05-19 01:17:22,933 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=332573.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:17:31,140 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=332585.0, num_to_drop=1, layers_to_drop={3} 2023-05-19 01:17:39,758 INFO [finetune.py:992] (1/2) Epoch 20, batch 150, loss[loss=0.1502, simple_loss=0.2481, pruned_loss=0.02611, over 12107.00 frames. ], tot_loss[loss=0.1684, simple_loss=0.2601, pruned_loss=0.03832, over 1265524.87 frames. ], batch size: 33, lr: 3.11e-03, grad_scale: 8.0 2023-05-19 01:17:45,533 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.560e+02 2.484e+02 2.986e+02 3.369e+02 5.653e+02, threshold=5.971e+02, percent-clipped=0.0 2023-05-19 01:17:51,324 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.37 vs. limit=2.0 2023-05-19 01:17:56,335 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=332621.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:18:05,437 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.32 vs. limit=2.0 2023-05-19 01:18:09,167 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=332640.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:18:14,492 INFO [finetune.py:992] (1/2) Epoch 20, batch 200, loss[loss=0.145, simple_loss=0.2261, pruned_loss=0.03197, over 12366.00 frames. ], tot_loss[loss=0.168, simple_loss=0.2595, pruned_loss=0.03824, over 1514860.86 frames. ], batch size: 30, lr: 3.11e-03, grad_scale: 8.0 2023-05-19 01:18:15,805 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.80 vs. limit=5.0 2023-05-19 01:18:21,681 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.2353, 4.7605, 5.2326, 4.6050, 4.9203, 4.6890, 5.2843, 4.8995], device='cuda:1'), covar=tensor([0.0334, 0.0468, 0.0328, 0.0289, 0.0507, 0.0383, 0.0225, 0.0365], device='cuda:1'), in_proj_covar=tensor([0.0272, 0.0275, 0.0295, 0.0268, 0.0269, 0.0267, 0.0245, 0.0219], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 01:18:30,003 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.0154, 2.5371, 3.5568, 3.0238, 3.3506, 3.1591, 2.6480, 3.4738], device='cuda:1'), covar=tensor([0.0171, 0.0441, 0.0206, 0.0307, 0.0237, 0.0218, 0.0377, 0.0183], device='cuda:1'), in_proj_covar=tensor([0.0182, 0.0204, 0.0191, 0.0186, 0.0217, 0.0167, 0.0198, 0.0193], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:18:42,198 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=332688.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:18:49,158 INFO [finetune.py:992] (1/2) Epoch 20, batch 250, loss[loss=0.1663, simple_loss=0.2664, pruned_loss=0.03305, over 11266.00 frames. ], tot_loss[loss=0.1669, simple_loss=0.2577, pruned_loss=0.038, over 1699646.09 frames. ], batch size: 55, lr: 3.11e-03, grad_scale: 8.0 2023-05-19 01:18:54,813 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.816e+02 2.640e+02 2.995e+02 3.621e+02 7.109e+02, threshold=5.991e+02, percent-clipped=2.0 2023-05-19 01:19:24,208 INFO [finetune.py:992] (1/2) Epoch 20, batch 300, loss[loss=0.1628, simple_loss=0.2575, pruned_loss=0.03407, over 12355.00 frames. ], tot_loss[loss=0.1654, simple_loss=0.2557, pruned_loss=0.03757, over 1856728.19 frames. ], batch size: 36, lr: 3.11e-03, grad_scale: 8.0 2023-05-19 01:19:45,015 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.2958, 4.6720, 2.9165, 2.6364, 4.0052, 2.6059, 3.7354, 3.2838], device='cuda:1'), covar=tensor([0.0776, 0.0494, 0.1303, 0.1704, 0.0346, 0.1455, 0.0632, 0.0814], device='cuda:1'), in_proj_covar=tensor([0.0187, 0.0255, 0.0177, 0.0202, 0.0142, 0.0183, 0.0197, 0.0175], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 01:19:49,845 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.5595, 3.5950, 3.2905, 3.1034, 2.7974, 2.6804, 3.4949, 2.2936], device='cuda:1'), covar=tensor([0.0439, 0.0171, 0.0225, 0.0248, 0.0429, 0.0422, 0.0167, 0.0588], device='cuda:1'), in_proj_covar=tensor([0.0197, 0.0165, 0.0169, 0.0194, 0.0202, 0.0201, 0.0176, 0.0206], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:19:59,233 INFO [finetune.py:992] (1/2) Epoch 20, batch 350, loss[loss=0.1548, simple_loss=0.236, pruned_loss=0.03681, over 12182.00 frames. ], tot_loss[loss=0.165, simple_loss=0.2553, pruned_loss=0.0374, over 1977300.34 frames. ], batch size: 29, lr: 3.11e-03, grad_scale: 8.0 2023-05-19 01:20:05,005 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.654e+02 2.629e+02 3.196e+02 3.896e+02 7.714e+02, threshold=6.392e+02, percent-clipped=2.0 2023-05-19 01:20:05,140 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=332806.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:20:21,501 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=332829.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:20:26,662 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.33 vs. limit=2.0 2023-05-19 01:20:34,314 INFO [finetune.py:992] (1/2) Epoch 20, batch 400, loss[loss=0.1765, simple_loss=0.2654, pruned_loss=0.04377, over 12028.00 frames. ], tot_loss[loss=0.164, simple_loss=0.2541, pruned_loss=0.03695, over 2063784.11 frames. ], batch size: 42, lr: 3.11e-03, grad_scale: 8.0 2023-05-19 01:20:38,542 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=332854.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:20:57,340 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=332880.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 01:21:04,441 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=332890.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:21:09,580 INFO [finetune.py:992] (1/2) Epoch 20, batch 450, loss[loss=0.163, simple_loss=0.259, pruned_loss=0.03347, over 12271.00 frames. ], tot_loss[loss=0.164, simple_loss=0.2543, pruned_loss=0.0369, over 2137412.60 frames. ], batch size: 37, lr: 3.11e-03, grad_scale: 8.0 2023-05-19 01:21:14,987 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.777e+02 2.551e+02 3.007e+02 3.557e+02 1.238e+03, threshold=6.013e+02, percent-clipped=1.0 2023-05-19 01:21:43,953 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.1100, 4.7480, 4.7437, 4.9215, 4.7761, 4.9962, 4.7273, 2.6890], device='cuda:1'), covar=tensor([0.0109, 0.0071, 0.0114, 0.0069, 0.0063, 0.0110, 0.0096, 0.0934], device='cuda:1'), in_proj_covar=tensor([0.0073, 0.0084, 0.0088, 0.0077, 0.0064, 0.0099, 0.0086, 0.0103], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 01:21:44,425 INFO [finetune.py:992] (1/2) Epoch 20, batch 500, loss[loss=0.1789, simple_loss=0.2739, pruned_loss=0.04198, over 12113.00 frames. ], tot_loss[loss=0.1642, simple_loss=0.2544, pruned_loss=0.03704, over 2191232.94 frames. ], batch size: 33, lr: 3.11e-03, grad_scale: 8.0 2023-05-19 01:21:55,939 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.86 vs. limit=2.0 2023-05-19 01:22:19,328 INFO [finetune.py:992] (1/2) Epoch 20, batch 550, loss[loss=0.1681, simple_loss=0.2532, pruned_loss=0.04144, over 12050.00 frames. ], tot_loss[loss=0.1637, simple_loss=0.2539, pruned_loss=0.03677, over 2228614.24 frames. ], batch size: 40, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:22:22,551 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=333002.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 01:22:25,192 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.899e+02 2.622e+02 3.186e+02 3.720e+02 6.953e+02, threshold=6.373e+02, percent-clipped=2.0 2023-05-19 01:22:39,358 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.7244, 3.3066, 5.1708, 2.5459, 2.8096, 3.8039, 3.1660, 3.7734], device='cuda:1'), covar=tensor([0.0492, 0.1294, 0.0345, 0.1338, 0.2129, 0.1594, 0.1587, 0.1300], device='cuda:1'), in_proj_covar=tensor([0.0239, 0.0240, 0.0263, 0.0190, 0.0239, 0.0294, 0.0231, 0.0271], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 01:22:40,635 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=333027.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:22:53,799 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.8437, 2.8761, 4.8162, 4.8894, 2.7693, 2.6796, 3.0088, 2.2937], device='cuda:1'), covar=tensor([0.1802, 0.3474, 0.0431, 0.0461, 0.1559, 0.2795, 0.3124, 0.4419], device='cuda:1'), in_proj_covar=tensor([0.0312, 0.0397, 0.0282, 0.0307, 0.0283, 0.0327, 0.0410, 0.0383], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:22:54,909 INFO [finetune.py:992] (1/2) Epoch 20, batch 600, loss[loss=0.1498, simple_loss=0.2364, pruned_loss=0.03155, over 12172.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.253, pruned_loss=0.03635, over 2265771.40 frames. ], batch size: 29, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:23:05,516 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=333063.0, num_to_drop=1, layers_to_drop={3} 2023-05-19 01:23:17,078 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.4248, 2.4296, 2.9719, 4.2176, 2.1739, 4.2251, 4.3582, 4.4087], device='cuda:1'), covar=tensor([0.0168, 0.1473, 0.0644, 0.0194, 0.1518, 0.0263, 0.0164, 0.0149], device='cuda:1'), in_proj_covar=tensor([0.0124, 0.0207, 0.0184, 0.0124, 0.0192, 0.0182, 0.0181, 0.0129], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:23:19,135 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.3528, 2.5605, 3.6524, 4.3524, 3.6131, 4.4023, 3.8135, 2.9299], device='cuda:1'), covar=tensor([0.0053, 0.0425, 0.0145, 0.0050, 0.0174, 0.0073, 0.0122, 0.0435], device='cuda:1'), in_proj_covar=tensor([0.0091, 0.0123, 0.0104, 0.0082, 0.0104, 0.0117, 0.0103, 0.0137], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 01:23:23,241 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=333088.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:23:30,125 INFO [finetune.py:992] (1/2) Epoch 20, batch 650, loss[loss=0.1749, simple_loss=0.2774, pruned_loss=0.03622, over 12353.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2528, pruned_loss=0.0363, over 2287384.89 frames. ], batch size: 35, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:23:35,421 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.653e+02 3.049e+02 3.616e+02 6.869e+02, threshold=6.098e+02, percent-clipped=2.0 2023-05-19 01:24:04,334 INFO [finetune.py:992] (1/2) Epoch 20, batch 700, loss[loss=0.1786, simple_loss=0.2678, pruned_loss=0.04469, over 12155.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.253, pruned_loss=0.03658, over 2309824.41 frames. ], batch size: 34, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:24:09,234 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.4590, 2.5602, 3.0994, 4.2285, 2.1941, 4.2715, 4.3957, 4.4339], device='cuda:1'), covar=tensor([0.0132, 0.1379, 0.0592, 0.0206, 0.1554, 0.0247, 0.0175, 0.0147], device='cuda:1'), in_proj_covar=tensor([0.0124, 0.0206, 0.0184, 0.0124, 0.0191, 0.0182, 0.0181, 0.0129], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:24:11,298 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.5500, 3.4033, 3.1331, 3.0296, 2.7864, 2.7047, 3.3683, 2.2506], device='cuda:1'), covar=tensor([0.0426, 0.0189, 0.0253, 0.0235, 0.0446, 0.0420, 0.0172, 0.0542], device='cuda:1'), in_proj_covar=tensor([0.0198, 0.0166, 0.0171, 0.0195, 0.0203, 0.0201, 0.0176, 0.0208], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:24:25,991 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=333180.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 01:24:30,022 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=333185.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:24:38,814 INFO [finetune.py:992] (1/2) Epoch 20, batch 750, loss[loss=0.1717, simple_loss=0.2632, pruned_loss=0.04007, over 12161.00 frames. ], tot_loss[loss=0.1642, simple_loss=0.2542, pruned_loss=0.03708, over 2321742.67 frames. ], batch size: 34, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:24:44,367 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.998e+02 2.641e+02 2.899e+02 3.426e+02 5.740e+02, threshold=5.798e+02, percent-clipped=0.0 2023-05-19 01:24:55,988 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=333222.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:24:59,979 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=333228.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 01:25:13,940 INFO [finetune.py:992] (1/2) Epoch 20, batch 800, loss[loss=0.1681, simple_loss=0.2681, pruned_loss=0.03408, over 12005.00 frames. ], tot_loss[loss=0.1639, simple_loss=0.2541, pruned_loss=0.03688, over 2326606.54 frames. ], batch size: 40, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:25:21,546 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.4519, 3.5454, 3.2572, 3.0700, 2.7836, 2.7082, 3.4896, 2.2148], device='cuda:1'), covar=tensor([0.0480, 0.0164, 0.0189, 0.0255, 0.0456, 0.0398, 0.0153, 0.0610], device='cuda:1'), in_proj_covar=tensor([0.0199, 0.0167, 0.0172, 0.0196, 0.0204, 0.0203, 0.0177, 0.0210], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:25:37,957 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=333283.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:25:40,038 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.4026, 4.8349, 3.0227, 2.8917, 4.1517, 2.8939, 4.1328, 3.6076], device='cuda:1'), covar=tensor([0.0753, 0.0587, 0.1266, 0.1491, 0.0315, 0.1183, 0.0470, 0.0684], device='cuda:1'), in_proj_covar=tensor([0.0191, 0.0261, 0.0181, 0.0205, 0.0145, 0.0187, 0.0202, 0.0178], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 01:25:48,116 INFO [finetune.py:992] (1/2) Epoch 20, batch 850, loss[loss=0.1585, simple_loss=0.2567, pruned_loss=0.03021, over 12056.00 frames. ], tot_loss[loss=0.1635, simple_loss=0.2534, pruned_loss=0.03684, over 2340954.81 frames. ], batch size: 40, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:25:53,808 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.586e+02 2.626e+02 3.010e+02 3.698e+02 7.947e+02, threshold=6.019e+02, percent-clipped=2.0 2023-05-19 01:25:57,145 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.26 vs. limit=2.0 2023-05-19 01:26:23,797 INFO [finetune.py:992] (1/2) Epoch 20, batch 900, loss[loss=0.1683, simple_loss=0.2496, pruned_loss=0.04353, over 12130.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2529, pruned_loss=0.03625, over 2358477.57 frames. ], batch size: 30, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:26:30,970 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=333358.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 01:26:48,955 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=333383.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:26:49,971 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.62 vs. limit=5.0 2023-05-19 01:26:59,194 INFO [finetune.py:992] (1/2) Epoch 20, batch 950, loss[loss=0.1375, simple_loss=0.2177, pruned_loss=0.02869, over 12008.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2521, pruned_loss=0.03615, over 2356300.08 frames. ], batch size: 28, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:27:04,592 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.2291, 3.5123, 3.4850, 3.8799, 2.6132, 3.4037, 2.5157, 3.1912], device='cuda:1'), covar=tensor([0.1794, 0.0899, 0.0993, 0.0626, 0.1422, 0.0862, 0.2028, 0.0907], device='cuda:1'), in_proj_covar=tensor([0.0234, 0.0272, 0.0298, 0.0360, 0.0246, 0.0247, 0.0266, 0.0371], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-19 01:27:05,025 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.539e+02 2.535e+02 2.875e+02 3.415e+02 5.071e+02, threshold=5.750e+02, percent-clipped=0.0 2023-05-19 01:27:05,207 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=333406.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:27:19,028 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.1468, 6.0320, 5.6120, 5.5681, 6.0939, 5.3554, 5.4246, 5.5567], device='cuda:1'), covar=tensor([0.1551, 0.1025, 0.1293, 0.1971, 0.0844, 0.2408, 0.2232, 0.1279], device='cuda:1'), in_proj_covar=tensor([0.0363, 0.0509, 0.0413, 0.0458, 0.0467, 0.0446, 0.0407, 0.0393], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 01:27:25,989 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.3350, 5.1269, 5.2530, 5.2876, 4.7943, 4.7640, 4.7529, 5.1129], device='cuda:1'), covar=tensor([0.0845, 0.0810, 0.0894, 0.0784, 0.2555, 0.2007, 0.0637, 0.1484], device='cuda:1'), in_proj_covar=tensor([0.0566, 0.0742, 0.0649, 0.0659, 0.0882, 0.0777, 0.0592, 0.0508], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0003, 0.0003], device='cuda:1') 2023-05-19 01:27:34,122 INFO [finetune.py:992] (1/2) Epoch 20, batch 1000, loss[loss=0.1541, simple_loss=0.2396, pruned_loss=0.03434, over 12349.00 frames. ], tot_loss[loss=0.162, simple_loss=0.2521, pruned_loss=0.03596, over 2356370.75 frames. ], batch size: 31, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:27:47,280 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=333467.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:28:00,551 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=333485.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:28:09,360 INFO [finetune.py:992] (1/2) Epoch 20, batch 1050, loss[loss=0.1592, simple_loss=0.25, pruned_loss=0.03419, over 12182.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.2511, pruned_loss=0.03561, over 2358259.33 frames. ], batch size: 35, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:28:14,963 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.701e+02 2.537e+02 2.902e+02 3.628e+02 7.311e+02, threshold=5.804e+02, percent-clipped=3.0 2023-05-19 01:28:28,466 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.2648, 2.6605, 3.8445, 3.2832, 3.6407, 3.3935, 2.8066, 3.7058], device='cuda:1'), covar=tensor([0.0152, 0.0396, 0.0160, 0.0234, 0.0195, 0.0195, 0.0412, 0.0146], device='cuda:1'), in_proj_covar=tensor([0.0190, 0.0212, 0.0200, 0.0194, 0.0227, 0.0175, 0.0206, 0.0201], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:28:34,553 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=333533.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:28:44,741 INFO [finetune.py:992] (1/2) Epoch 20, batch 1100, loss[loss=0.1507, simple_loss=0.2425, pruned_loss=0.02946, over 12018.00 frames. ], tot_loss[loss=0.1608, simple_loss=0.2507, pruned_loss=0.03546, over 2360934.40 frames. ], batch size: 31, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:29:05,965 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=333578.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:29:19,781 INFO [finetune.py:992] (1/2) Epoch 20, batch 1150, loss[loss=0.1569, simple_loss=0.2524, pruned_loss=0.03073, over 12358.00 frames. ], tot_loss[loss=0.1608, simple_loss=0.2508, pruned_loss=0.0354, over 2360852.16 frames. ], batch size: 35, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:29:25,533 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.732e+02 2.683e+02 3.097e+02 3.738e+02 5.412e+02, threshold=6.193e+02, percent-clipped=0.0 2023-05-19 01:29:55,551 INFO [finetune.py:992] (1/2) Epoch 20, batch 1200, loss[loss=0.1463, simple_loss=0.2327, pruned_loss=0.02996, over 12001.00 frames. ], tot_loss[loss=0.1605, simple_loss=0.2501, pruned_loss=0.03544, over 2366600.90 frames. ], batch size: 28, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:30:02,753 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=333658.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 01:30:20,505 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=333683.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:30:30,975 INFO [finetune.py:992] (1/2) Epoch 20, batch 1250, loss[loss=0.1445, simple_loss=0.2233, pruned_loss=0.03288, over 11815.00 frames. ], tot_loss[loss=0.161, simple_loss=0.2509, pruned_loss=0.03551, over 2362248.81 frames. ], batch size: 26, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:30:36,592 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.693e+02 2.535e+02 2.875e+02 3.333e+02 5.820e+02, threshold=5.749e+02, percent-clipped=0.0 2023-05-19 01:30:36,671 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=333706.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 01:30:53,897 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=333731.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:31:05,844 INFO [finetune.py:992] (1/2) Epoch 20, batch 1300, loss[loss=0.1608, simple_loss=0.2544, pruned_loss=0.03358, over 12162.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.2512, pruned_loss=0.03553, over 2365244.25 frames. ], batch size: 36, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:31:15,452 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=333762.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:31:40,841 INFO [finetune.py:992] (1/2) Epoch 20, batch 1350, loss[loss=0.1544, simple_loss=0.233, pruned_loss=0.03791, over 12005.00 frames. ], tot_loss[loss=0.161, simple_loss=0.2514, pruned_loss=0.03533, over 2373684.42 frames. ], batch size: 28, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:31:46,704 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.856e+02 2.512e+02 2.833e+02 3.283e+02 6.251e+02, threshold=5.665e+02, percent-clipped=1.0 2023-05-19 01:32:16,453 INFO [finetune.py:992] (1/2) Epoch 20, batch 1400, loss[loss=0.159, simple_loss=0.2437, pruned_loss=0.03714, over 12175.00 frames. ], tot_loss[loss=0.1609, simple_loss=0.2515, pruned_loss=0.03516, over 2379218.82 frames. ], batch size: 31, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:32:34,326 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.4370, 4.8750, 4.3629, 5.1407, 4.6874, 3.2925, 4.4214, 3.1393], device='cuda:1'), covar=tensor([0.0892, 0.0755, 0.1469, 0.0536, 0.1174, 0.1536, 0.1074, 0.3584], device='cuda:1'), in_proj_covar=tensor([0.0317, 0.0384, 0.0368, 0.0346, 0.0378, 0.0283, 0.0356, 0.0374], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:32:37,607 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=333878.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:32:51,692 INFO [finetune.py:992] (1/2) Epoch 20, batch 1450, loss[loss=0.1519, simple_loss=0.2483, pruned_loss=0.02778, over 12204.00 frames. ], tot_loss[loss=0.1605, simple_loss=0.251, pruned_loss=0.03498, over 2379294.33 frames. ], batch size: 35, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:32:56,452 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2023-05-19 01:32:57,257 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.737e+02 2.486e+02 2.911e+02 3.434e+02 5.921e+02, threshold=5.822e+02, percent-clipped=1.0 2023-05-19 01:33:07,709 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.8159, 3.2527, 2.4172, 2.2169, 2.9673, 2.3247, 3.0865, 2.6634], device='cuda:1'), covar=tensor([0.0640, 0.0649, 0.1020, 0.1460, 0.0303, 0.1234, 0.0592, 0.0792], device='cuda:1'), in_proj_covar=tensor([0.0191, 0.0263, 0.0182, 0.0206, 0.0147, 0.0188, 0.0204, 0.0178], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 01:33:10,954 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=333926.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:33:20,273 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.1123, 4.7279, 4.7929, 4.9428, 4.7007, 4.9456, 4.8927, 2.6057], device='cuda:1'), covar=tensor([0.0095, 0.0069, 0.0098, 0.0059, 0.0061, 0.0104, 0.0086, 0.0902], device='cuda:1'), in_proj_covar=tensor([0.0073, 0.0084, 0.0088, 0.0077, 0.0064, 0.0098, 0.0086, 0.0102], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 01:33:27,094 INFO [finetune.py:992] (1/2) Epoch 20, batch 1500, loss[loss=0.1981, simple_loss=0.2788, pruned_loss=0.05875, over 7598.00 frames. ], tot_loss[loss=0.1607, simple_loss=0.2512, pruned_loss=0.03514, over 2377228.44 frames. ], batch size: 98, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:33:28,799 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.1515, 4.3268, 4.3706, 4.4348, 2.9309, 4.3027, 3.0754, 4.3382], device='cuda:1'), covar=tensor([0.1770, 0.0624, 0.0650, 0.0509, 0.1229, 0.0542, 0.1597, 0.1146], device='cuda:1'), in_proj_covar=tensor([0.0237, 0.0276, 0.0303, 0.0366, 0.0250, 0.0250, 0.0270, 0.0377], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-19 01:33:33,195 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=3.32 vs. limit=5.0 2023-05-19 01:33:42,365 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.0316, 2.1768, 2.8668, 3.0410, 2.9874, 3.1489, 2.9709, 2.4566], device='cuda:1'), covar=tensor([0.0118, 0.0409, 0.0195, 0.0093, 0.0157, 0.0098, 0.0166, 0.0374], device='cuda:1'), in_proj_covar=tensor([0.0091, 0.0123, 0.0105, 0.0082, 0.0104, 0.0117, 0.0103, 0.0137], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 01:34:02,323 INFO [finetune.py:992] (1/2) Epoch 20, batch 1550, loss[loss=0.139, simple_loss=0.2268, pruned_loss=0.02562, over 12006.00 frames. ], tot_loss[loss=0.1604, simple_loss=0.251, pruned_loss=0.03492, over 2383139.74 frames. ], batch size: 28, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:34:10,934 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.806e+02 2.627e+02 3.050e+02 3.629e+02 6.621e+02, threshold=6.100e+02, percent-clipped=1.0 2023-05-19 01:34:18,065 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=334016.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 01:34:29,487 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.66 vs. limit=5.0 2023-05-19 01:34:40,157 INFO [finetune.py:992] (1/2) Epoch 20, batch 1600, loss[loss=0.2141, simple_loss=0.2911, pruned_loss=0.0686, over 7961.00 frames. ], tot_loss[loss=0.1603, simple_loss=0.2507, pruned_loss=0.03491, over 2381935.97 frames. ], batch size: 98, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:34:50,080 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=334062.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:35:00,390 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=334077.0, num_to_drop=1, layers_to_drop={3} 2023-05-19 01:35:15,159 INFO [finetune.py:992] (1/2) Epoch 20, batch 1650, loss[loss=0.1559, simple_loss=0.247, pruned_loss=0.03242, over 12157.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.2517, pruned_loss=0.03572, over 2370274.88 frames. ], batch size: 34, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:35:20,551 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.744e+02 2.662e+02 2.886e+02 3.519e+02 5.131e+02, threshold=5.773e+02, percent-clipped=0.0 2023-05-19 01:35:23,370 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=334110.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:35:50,188 INFO [finetune.py:992] (1/2) Epoch 20, batch 1700, loss[loss=0.1719, simple_loss=0.2606, pruned_loss=0.04159, over 12343.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.2517, pruned_loss=0.03586, over 2362974.16 frames. ], batch size: 36, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:36:11,220 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.5534, 5.1032, 5.5324, 4.8797, 5.1975, 4.9366, 5.6126, 5.2041], device='cuda:1'), covar=tensor([0.0270, 0.0362, 0.0271, 0.0238, 0.0395, 0.0396, 0.0164, 0.0234], device='cuda:1'), in_proj_covar=tensor([0.0281, 0.0285, 0.0307, 0.0276, 0.0278, 0.0277, 0.0253, 0.0226], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 01:36:24,991 INFO [finetune.py:992] (1/2) Epoch 20, batch 1750, loss[loss=0.1395, simple_loss=0.2175, pruned_loss=0.03073, over 11854.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.2521, pruned_loss=0.0358, over 2362062.16 frames. ], batch size: 26, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:36:29,194 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.87 vs. limit=2.0 2023-05-19 01:36:30,881 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.634e+02 3.119e+02 3.600e+02 7.041e+02, threshold=6.237e+02, percent-clipped=1.0 2023-05-19 01:36:35,986 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.6646, 5.4310, 5.5647, 5.6262, 5.2789, 5.2943, 5.0254, 5.5970], device='cuda:1'), covar=tensor([0.0770, 0.0660, 0.0794, 0.0606, 0.1848, 0.1466, 0.0590, 0.0961], device='cuda:1'), in_proj_covar=tensor([0.0570, 0.0744, 0.0649, 0.0661, 0.0887, 0.0779, 0.0594, 0.0508], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0003, 0.0003], device='cuda:1') 2023-05-19 01:36:38,891 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.5536, 2.9150, 5.0714, 2.3833, 2.5347, 4.0485, 2.9202, 3.8747], device='cuda:1'), covar=tensor([0.0452, 0.1500, 0.0302, 0.1408, 0.2216, 0.1180, 0.1718, 0.1100], device='cuda:1'), in_proj_covar=tensor([0.0244, 0.0244, 0.0268, 0.0192, 0.0244, 0.0298, 0.0233, 0.0276], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 01:37:00,810 INFO [finetune.py:992] (1/2) Epoch 20, batch 1800, loss[loss=0.1475, simple_loss=0.2312, pruned_loss=0.03196, over 12189.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.2515, pruned_loss=0.03585, over 2363835.48 frames. ], batch size: 29, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:37:36,336 INFO [finetune.py:992] (1/2) Epoch 20, batch 1850, loss[loss=0.1703, simple_loss=0.2689, pruned_loss=0.03589, over 12267.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.2512, pruned_loss=0.03545, over 2367680.63 frames. ], batch size: 37, lr: 3.11e-03, grad_scale: 16.0 2023-05-19 01:37:42,082 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.589e+02 2.626e+02 2.937e+02 3.598e+02 5.513e+02, threshold=5.873e+02, percent-clipped=0.0 2023-05-19 01:37:53,508 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.3707, 4.7436, 2.9979, 2.7521, 4.1881, 2.8553, 4.0100, 3.3472], device='cuda:1'), covar=tensor([0.0763, 0.0581, 0.1151, 0.1574, 0.0264, 0.1240, 0.0529, 0.0841], device='cuda:1'), in_proj_covar=tensor([0.0192, 0.0266, 0.0182, 0.0207, 0.0147, 0.0189, 0.0205, 0.0180], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 01:38:09,362 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.5307, 2.5340, 3.6206, 4.4968, 3.8769, 4.5911, 3.9659, 3.3701], device='cuda:1'), covar=tensor([0.0050, 0.0475, 0.0171, 0.0058, 0.0147, 0.0078, 0.0141, 0.0357], device='cuda:1'), in_proj_covar=tensor([0.0092, 0.0124, 0.0105, 0.0083, 0.0105, 0.0119, 0.0104, 0.0138], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 01:38:11,283 INFO [finetune.py:992] (1/2) Epoch 20, batch 1900, loss[loss=0.1509, simple_loss=0.2461, pruned_loss=0.02787, over 12249.00 frames. ], tot_loss[loss=0.1599, simple_loss=0.25, pruned_loss=0.03489, over 2368751.57 frames. ], batch size: 32, lr: 3.10e-03, grad_scale: 16.0 2023-05-19 01:38:24,340 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.37 vs. limit=2.0 2023-05-19 01:38:25,398 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.1558, 4.7819, 4.8718, 5.0349, 4.7848, 5.0431, 4.9072, 2.5079], device='cuda:1'), covar=tensor([0.0106, 0.0078, 0.0114, 0.0058, 0.0058, 0.0102, 0.0104, 0.0987], device='cuda:1'), in_proj_covar=tensor([0.0074, 0.0086, 0.0089, 0.0079, 0.0065, 0.0100, 0.0088, 0.0104], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 01:38:28,190 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=334372.0, num_to_drop=1, layers_to_drop={2} 2023-05-19 01:38:46,673 INFO [finetune.py:992] (1/2) Epoch 20, batch 1950, loss[loss=0.154, simple_loss=0.248, pruned_loss=0.02998, over 12151.00 frames. ], tot_loss[loss=0.1603, simple_loss=0.2505, pruned_loss=0.03504, over 2378623.62 frames. ], batch size: 34, lr: 3.10e-03, grad_scale: 16.0 2023-05-19 01:38:53,058 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.026e+02 2.618e+02 2.924e+02 3.581e+02 8.497e+02, threshold=5.848e+02, percent-clipped=1.0 2023-05-19 01:38:54,566 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.7368, 4.6105, 4.5874, 4.6284, 4.3297, 4.7839, 4.7337, 4.8772], device='cuda:1'), covar=tensor([0.0357, 0.0190, 0.0224, 0.0378, 0.0761, 0.0338, 0.0175, 0.0203], device='cuda:1'), in_proj_covar=tensor([0.0206, 0.0208, 0.0202, 0.0259, 0.0249, 0.0231, 0.0187, 0.0244], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:1') 2023-05-19 01:39:22,123 INFO [finetune.py:992] (1/2) Epoch 20, batch 2000, loss[loss=0.2034, simple_loss=0.2946, pruned_loss=0.05613, over 12131.00 frames. ], tot_loss[loss=0.1602, simple_loss=0.2505, pruned_loss=0.035, over 2372055.21 frames. ], batch size: 39, lr: 3.10e-03, grad_scale: 16.0 2023-05-19 01:39:56,210 INFO [finetune.py:992] (1/2) Epoch 20, batch 2050, loss[loss=0.1498, simple_loss=0.2333, pruned_loss=0.03312, over 12159.00 frames. ], tot_loss[loss=0.161, simple_loss=0.2512, pruned_loss=0.03535, over 2369097.33 frames. ], batch size: 29, lr: 3.10e-03, grad_scale: 16.0 2023-05-19 01:40:01,610 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.895e+02 2.676e+02 3.222e+02 3.975e+02 1.107e+03, threshold=6.444e+02, percent-clipped=4.0 2023-05-19 01:40:07,667 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.47 vs. limit=2.0 2023-05-19 01:40:26,156 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.1815, 2.6121, 3.7675, 3.0324, 3.5126, 3.2183, 2.7534, 3.6485], device='cuda:1'), covar=tensor([0.0162, 0.0385, 0.0170, 0.0270, 0.0171, 0.0236, 0.0378, 0.0155], device='cuda:1'), in_proj_covar=tensor([0.0190, 0.0212, 0.0200, 0.0195, 0.0227, 0.0175, 0.0206, 0.0201], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:40:31,437 INFO [finetune.py:992] (1/2) Epoch 20, batch 2100, loss[loss=0.1597, simple_loss=0.2526, pruned_loss=0.03342, over 12194.00 frames. ], tot_loss[loss=0.1609, simple_loss=0.2514, pruned_loss=0.03522, over 2374725.16 frames. ], batch size: 35, lr: 3.10e-03, grad_scale: 16.0 2023-05-19 01:41:06,352 INFO [finetune.py:992] (1/2) Epoch 20, batch 2150, loss[loss=0.1582, simple_loss=0.2575, pruned_loss=0.02942, over 12299.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.2518, pruned_loss=0.03531, over 2378081.52 frames. ], batch size: 37, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:41:08,860 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.5524, 2.7850, 3.6773, 4.5546, 3.9526, 4.6012, 3.9500, 3.3714], device='cuda:1'), covar=tensor([0.0046, 0.0392, 0.0163, 0.0046, 0.0120, 0.0086, 0.0161, 0.0350], device='cuda:1'), in_proj_covar=tensor([0.0093, 0.0126, 0.0107, 0.0084, 0.0106, 0.0120, 0.0106, 0.0139], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 01:41:09,615 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.8421, 3.8866, 3.3836, 3.1783, 2.9286, 2.8485, 3.8011, 2.3916], device='cuda:1'), covar=tensor([0.0402, 0.0140, 0.0216, 0.0272, 0.0501, 0.0468, 0.0170, 0.0623], device='cuda:1'), in_proj_covar=tensor([0.0202, 0.0168, 0.0176, 0.0199, 0.0208, 0.0207, 0.0181, 0.0213], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:41:12,932 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.789e+02 2.666e+02 3.122e+02 3.748e+02 7.346e+02, threshold=6.243e+02, percent-clipped=1.0 2023-05-19 01:41:14,537 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=334609.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 01:41:33,991 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.4605, 2.4413, 3.5569, 4.4246, 3.7603, 4.5054, 3.8780, 3.2740], device='cuda:1'), covar=tensor([0.0060, 0.0464, 0.0200, 0.0064, 0.0177, 0.0077, 0.0175, 0.0375], device='cuda:1'), in_proj_covar=tensor([0.0093, 0.0126, 0.0107, 0.0084, 0.0106, 0.0120, 0.0106, 0.0140], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 01:41:41,320 INFO [finetune.py:992] (1/2) Epoch 20, batch 2200, loss[loss=0.1981, simple_loss=0.2817, pruned_loss=0.05722, over 8371.00 frames. ], tot_loss[loss=0.1609, simple_loss=0.2516, pruned_loss=0.03513, over 2380564.76 frames. ], batch size: 97, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:41:57,606 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=334670.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 01:41:58,830 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=334672.0, num_to_drop=1, layers_to_drop={2} 2023-05-19 01:42:01,932 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.75 vs. limit=5.0 2023-05-19 01:42:17,550 INFO [finetune.py:992] (1/2) Epoch 20, batch 2250, loss[loss=0.169, simple_loss=0.2658, pruned_loss=0.03609, over 12190.00 frames. ], tot_loss[loss=0.1607, simple_loss=0.2512, pruned_loss=0.03507, over 2379036.59 frames. ], batch size: 35, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:42:23,678 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.619e+02 2.718e+02 3.096e+02 3.651e+02 6.621e+02, threshold=6.192e+02, percent-clipped=1.0 2023-05-19 01:42:32,787 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=334720.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 01:42:51,928 INFO [finetune.py:992] (1/2) Epoch 20, batch 2300, loss[loss=0.1359, simple_loss=0.2211, pruned_loss=0.02538, over 12204.00 frames. ], tot_loss[loss=0.1604, simple_loss=0.251, pruned_loss=0.03495, over 2369713.20 frames. ], batch size: 29, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:43:26,039 INFO [finetune.py:992] (1/2) Epoch 20, batch 2350, loss[loss=0.1683, simple_loss=0.2612, pruned_loss=0.03765, over 12273.00 frames. ], tot_loss[loss=0.1605, simple_loss=0.2511, pruned_loss=0.035, over 2376921.15 frames. ], batch size: 37, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:43:32,584 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.684e+02 2.563e+02 3.097e+02 3.636e+02 6.081e+02, threshold=6.194e+02, percent-clipped=0.0 2023-05-19 01:44:02,060 INFO [finetune.py:992] (1/2) Epoch 20, batch 2400, loss[loss=0.21, simple_loss=0.2924, pruned_loss=0.0638, over 8576.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.2517, pruned_loss=0.0353, over 2373496.81 frames. ], batch size: 98, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:44:35,735 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=3.16 vs. limit=5.0 2023-05-19 01:44:36,691 INFO [finetune.py:992] (1/2) Epoch 20, batch 2450, loss[loss=0.1537, simple_loss=0.2447, pruned_loss=0.03131, over 12291.00 frames. ], tot_loss[loss=0.1604, simple_loss=0.2507, pruned_loss=0.03498, over 2379643.54 frames. ], batch size: 33, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:44:42,906 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.723e+02 2.661e+02 3.134e+02 3.717e+02 6.499e+02, threshold=6.267e+02, percent-clipped=1.0 2023-05-19 01:45:08,256 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=334943.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:45:11,550 INFO [finetune.py:992] (1/2) Epoch 20, batch 2500, loss[loss=0.152, simple_loss=0.2465, pruned_loss=0.02876, over 12364.00 frames. ], tot_loss[loss=0.1598, simple_loss=0.2501, pruned_loss=0.03476, over 2377177.26 frames. ], batch size: 35, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:45:13,820 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=334951.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:45:23,820 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.38 vs. limit=2.0 2023-05-19 01:45:24,224 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=334965.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 01:45:47,872 INFO [finetune.py:992] (1/2) Epoch 20, batch 2550, loss[loss=0.1461, simple_loss=0.2296, pruned_loss=0.03129, over 12326.00 frames. ], tot_loss[loss=0.1598, simple_loss=0.2501, pruned_loss=0.03477, over 2376334.88 frames. ], batch size: 28, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:45:52,553 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=335004.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:45:54,435 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.914e+02 2.500e+02 2.828e+02 3.369e+02 5.614e+02, threshold=5.656e+02, percent-clipped=0.0 2023-05-19 01:45:58,206 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=335012.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:45:59,579 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=335014.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:46:07,411 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.28 vs. limit=2.0 2023-05-19 01:46:19,353 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2023-05-19 01:46:23,151 INFO [finetune.py:992] (1/2) Epoch 20, batch 2600, loss[loss=0.153, simple_loss=0.2508, pruned_loss=0.02761, over 12305.00 frames. ], tot_loss[loss=0.1589, simple_loss=0.2493, pruned_loss=0.03426, over 2386429.97 frames. ], batch size: 34, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:46:41,987 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=335075.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:46:58,516 INFO [finetune.py:992] (1/2) Epoch 20, batch 2650, loss[loss=0.1615, simple_loss=0.2477, pruned_loss=0.03765, over 12369.00 frames. ], tot_loss[loss=0.1594, simple_loss=0.2497, pruned_loss=0.03454, over 2386032.61 frames. ], batch size: 35, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:47:04,864 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.548e+02 2.565e+02 2.983e+02 3.518e+02 7.202e+02, threshold=5.967e+02, percent-clipped=2.0 2023-05-19 01:47:18,725 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.0822, 4.0029, 4.0516, 4.3844, 2.7169, 3.8435, 2.5889, 4.1964], device='cuda:1'), covar=tensor([0.1793, 0.0795, 0.0967, 0.0696, 0.1361, 0.0720, 0.1973, 0.1048], device='cuda:1'), in_proj_covar=tensor([0.0235, 0.0274, 0.0302, 0.0365, 0.0247, 0.0250, 0.0266, 0.0374], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-19 01:47:33,762 INFO [finetune.py:992] (1/2) Epoch 20, batch 2700, loss[loss=0.1681, simple_loss=0.2593, pruned_loss=0.03841, over 12094.00 frames. ], tot_loss[loss=0.1595, simple_loss=0.2497, pruned_loss=0.03467, over 2378890.07 frames. ], batch size: 38, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:47:55,487 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.9371, 2.3576, 3.5261, 2.8903, 3.2747, 3.0394, 2.5640, 3.4199], device='cuda:1'), covar=tensor([0.0202, 0.0484, 0.0190, 0.0347, 0.0205, 0.0284, 0.0437, 0.0167], device='cuda:1'), in_proj_covar=tensor([0.0192, 0.0214, 0.0202, 0.0197, 0.0228, 0.0177, 0.0207, 0.0203], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:47:58,898 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.8834, 3.9553, 3.5040, 3.3083, 3.1082, 2.9842, 3.9411, 2.5704], device='cuda:1'), covar=tensor([0.0409, 0.0131, 0.0190, 0.0274, 0.0401, 0.0443, 0.0130, 0.0529], device='cuda:1'), in_proj_covar=tensor([0.0201, 0.0168, 0.0175, 0.0198, 0.0206, 0.0206, 0.0181, 0.0212], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:48:02,363 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=335189.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:48:08,470 INFO [finetune.py:992] (1/2) Epoch 20, batch 2750, loss[loss=0.1507, simple_loss=0.2315, pruned_loss=0.03496, over 11711.00 frames. ], tot_loss[loss=0.1591, simple_loss=0.2493, pruned_loss=0.03446, over 2384877.33 frames. ], batch size: 26, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:48:14,916 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.771e+02 2.685e+02 3.034e+02 3.622e+02 7.994e+02, threshold=6.067e+02, percent-clipped=2.0 2023-05-19 01:48:44,017 INFO [finetune.py:992] (1/2) Epoch 20, batch 2800, loss[loss=0.1464, simple_loss=0.2438, pruned_loss=0.02452, over 12363.00 frames. ], tot_loss[loss=0.1594, simple_loss=0.2499, pruned_loss=0.03439, over 2388141.91 frames. ], batch size: 36, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:48:45,569 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=335250.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:48:56,065 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=335265.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 01:49:00,899 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.6923, 2.8626, 3.3946, 4.5290, 2.5700, 4.4766, 4.6376, 4.6803], device='cuda:1'), covar=tensor([0.0191, 0.1260, 0.0500, 0.0173, 0.1432, 0.0297, 0.0212, 0.0118], device='cuda:1'), in_proj_covar=tensor([0.0129, 0.0211, 0.0191, 0.0128, 0.0196, 0.0188, 0.0188, 0.0132], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0002], device='cuda:1') 2023-05-19 01:49:19,245 INFO [finetune.py:992] (1/2) Epoch 20, batch 2850, loss[loss=0.1406, simple_loss=0.2243, pruned_loss=0.02842, over 12187.00 frames. ], tot_loss[loss=0.1593, simple_loss=0.2498, pruned_loss=0.0344, over 2385439.86 frames. ], batch size: 31, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:49:19,960 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=335299.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:49:25,407 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.880e+02 2.469e+02 2.827e+02 3.297e+02 5.034e+02, threshold=5.653e+02, percent-clipped=0.0 2023-05-19 01:49:25,513 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=335307.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:49:29,187 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.5804, 5.1485, 5.5564, 4.9683, 5.2940, 4.9917, 5.6238, 5.1710], device='cuda:1'), covar=tensor([0.0254, 0.0387, 0.0273, 0.0212, 0.0403, 0.0322, 0.0205, 0.0222], device='cuda:1'), in_proj_covar=tensor([0.0287, 0.0290, 0.0315, 0.0282, 0.0284, 0.0282, 0.0259, 0.0232], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 01:49:29,827 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=335313.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 01:49:54,084 INFO [finetune.py:992] (1/2) Epoch 20, batch 2900, loss[loss=0.1919, simple_loss=0.2782, pruned_loss=0.05273, over 12141.00 frames. ], tot_loss[loss=0.1595, simple_loss=0.25, pruned_loss=0.03455, over 2389441.21 frames. ], batch size: 36, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:50:09,298 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=335370.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:50:29,203 INFO [finetune.py:992] (1/2) Epoch 20, batch 2950, loss[loss=0.134, simple_loss=0.2219, pruned_loss=0.02308, over 12018.00 frames. ], tot_loss[loss=0.1599, simple_loss=0.2504, pruned_loss=0.0347, over 2379052.26 frames. ], batch size: 28, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:50:33,452 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.17 vs. limit=2.0 2023-05-19 01:50:35,695 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.710e+02 2.492e+02 3.064e+02 3.622e+02 5.950e+02, threshold=6.129e+02, percent-clipped=2.0 2023-05-19 01:50:38,466 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.4820, 5.2446, 5.3849, 5.4492, 5.0420, 5.1338, 4.8132, 5.4254], device='cuda:1'), covar=tensor([0.0728, 0.0680, 0.0915, 0.0616, 0.2064, 0.1527, 0.0651, 0.0898], device='cuda:1'), in_proj_covar=tensor([0.0584, 0.0759, 0.0662, 0.0671, 0.0911, 0.0795, 0.0605, 0.0517], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:1') 2023-05-19 01:50:57,096 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.0261, 2.5345, 3.6206, 3.0058, 3.4325, 3.1744, 2.6616, 3.4881], device='cuda:1'), covar=tensor([0.0188, 0.0410, 0.0186, 0.0323, 0.0186, 0.0241, 0.0443, 0.0185], device='cuda:1'), in_proj_covar=tensor([0.0194, 0.0215, 0.0203, 0.0199, 0.0231, 0.0179, 0.0209, 0.0205], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:51:03,504 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.6026, 3.6440, 3.2497, 3.1183, 2.8713, 2.7840, 3.6474, 2.2925], device='cuda:1'), covar=tensor([0.0481, 0.0154, 0.0250, 0.0284, 0.0444, 0.0460, 0.0158, 0.0649], device='cuda:1'), in_proj_covar=tensor([0.0203, 0.0170, 0.0177, 0.0200, 0.0210, 0.0208, 0.0183, 0.0215], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:51:04,664 INFO [finetune.py:992] (1/2) Epoch 20, batch 3000, loss[loss=0.1348, simple_loss=0.2171, pruned_loss=0.02624, over 11987.00 frames. ], tot_loss[loss=0.1598, simple_loss=0.2505, pruned_loss=0.03457, over 2381602.46 frames. ], batch size: 28, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:51:04,664 INFO [finetune.py:1017] (1/2) Computing validation loss 2023-05-19 01:51:12,706 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.0065, 2.3900, 3.5365, 4.1276, 3.7575, 4.1212, 3.7845, 2.7331], device='cuda:1'), covar=tensor([0.0069, 0.0446, 0.0145, 0.0056, 0.0109, 0.0084, 0.0129, 0.0480], device='cuda:1'), in_proj_covar=tensor([0.0092, 0.0125, 0.0106, 0.0083, 0.0106, 0.0119, 0.0106, 0.0140], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 01:51:22,069 INFO [finetune.py:1026] (1/2) Epoch 20, validation: loss=0.3175, simple_loss=0.3915, pruned_loss=0.1217, over 1020973.00 frames. 2023-05-19 01:51:22,070 INFO [finetune.py:1027] (1/2) Maximum memory allocated so far is 12856MB 2023-05-19 01:51:52,883 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.40 vs. limit=2.0 2023-05-19 01:52:01,128 INFO [finetune.py:992] (1/2) Epoch 20, batch 3050, loss[loss=0.1647, simple_loss=0.2586, pruned_loss=0.03537, over 12311.00 frames. ], tot_loss[loss=0.1593, simple_loss=0.2496, pruned_loss=0.03445, over 2387459.52 frames. ], batch size: 34, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:52:07,453 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.677e+02 2.598e+02 2.989e+02 3.407e+02 6.082e+02, threshold=5.978e+02, percent-clipped=0.0 2023-05-19 01:52:34,985 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=335545.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:52:36,929 INFO [finetune.py:992] (1/2) Epoch 20, batch 3100, loss[loss=0.1608, simple_loss=0.2535, pruned_loss=0.03403, over 12139.00 frames. ], tot_loss[loss=0.1596, simple_loss=0.2498, pruned_loss=0.03474, over 2388558.26 frames. ], batch size: 39, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:53:11,909 INFO [finetune.py:992] (1/2) Epoch 20, batch 3150, loss[loss=0.1485, simple_loss=0.2399, pruned_loss=0.02851, over 12177.00 frames. ], tot_loss[loss=0.1599, simple_loss=0.2499, pruned_loss=0.03497, over 2383207.53 frames. ], batch size: 31, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:53:12,631 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=335599.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:53:18,348 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.672e+02 3.275e+02 4.152e+02 4.460e+03, threshold=6.549e+02, percent-clipped=10.0 2023-05-19 01:53:18,532 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=335607.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:53:26,816 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.7876, 3.1591, 3.8602, 4.6665, 4.1873, 4.8688, 4.1994, 3.5151], device='cuda:1'), covar=tensor([0.0041, 0.0350, 0.0133, 0.0053, 0.0119, 0.0065, 0.0127, 0.0326], device='cuda:1'), in_proj_covar=tensor([0.0092, 0.0125, 0.0106, 0.0083, 0.0106, 0.0119, 0.0105, 0.0139], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 01:53:46,340 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.0352, 4.8848, 4.8245, 4.8417, 4.5381, 5.0221, 5.0556, 5.1414], device='cuda:1'), covar=tensor([0.0264, 0.0182, 0.0193, 0.0373, 0.0752, 0.0358, 0.0159, 0.0225], device='cuda:1'), in_proj_covar=tensor([0.0207, 0.0208, 0.0202, 0.0258, 0.0249, 0.0232, 0.0187, 0.0245], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:1') 2023-05-19 01:53:46,915 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=335647.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:53:47,592 INFO [finetune.py:992] (1/2) Epoch 20, batch 3200, loss[loss=0.1439, simple_loss=0.2385, pruned_loss=0.02458, over 12413.00 frames. ], tot_loss[loss=0.1597, simple_loss=0.2497, pruned_loss=0.03481, over 2382000.76 frames. ], batch size: 32, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:53:50,931 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.39 vs. limit=2.0 2023-05-19 01:53:52,582 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=335655.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:54:02,924 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=335670.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:54:15,760 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.1452, 2.4583, 3.6997, 3.1024, 3.5062, 3.2066, 2.6514, 3.5869], device='cuda:1'), covar=tensor([0.0161, 0.0420, 0.0195, 0.0288, 0.0165, 0.0232, 0.0447, 0.0159], device='cuda:1'), in_proj_covar=tensor([0.0195, 0.0216, 0.0205, 0.0199, 0.0233, 0.0179, 0.0210, 0.0206], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:54:23,194 INFO [finetune.py:992] (1/2) Epoch 20, batch 3250, loss[loss=0.1504, simple_loss=0.2385, pruned_loss=0.03117, over 12193.00 frames. ], tot_loss[loss=0.1596, simple_loss=0.2496, pruned_loss=0.03478, over 2382798.17 frames. ], batch size: 31, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:54:29,527 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.418e+02 2.914e+02 3.419e+02 5.919e+02, threshold=5.828e+02, percent-clipped=0.0 2023-05-19 01:54:32,698 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.38 vs. limit=2.0 2023-05-19 01:54:37,132 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=335718.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:54:57,873 INFO [finetune.py:992] (1/2) Epoch 20, batch 3300, loss[loss=0.1606, simple_loss=0.256, pruned_loss=0.0326, over 12364.00 frames. ], tot_loss[loss=0.16, simple_loss=0.2501, pruned_loss=0.03496, over 2366934.20 frames. ], batch size: 35, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:55:05,032 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.1757, 4.6545, 4.1750, 4.7952, 4.3983, 2.9121, 4.1512, 3.0998], device='cuda:1'), covar=tensor([0.0939, 0.0682, 0.1300, 0.0623, 0.1328, 0.1826, 0.1091, 0.3501], device='cuda:1'), in_proj_covar=tensor([0.0324, 0.0394, 0.0376, 0.0356, 0.0386, 0.0289, 0.0362, 0.0381], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:55:33,087 INFO [finetune.py:992] (1/2) Epoch 20, batch 3350, loss[loss=0.1603, simple_loss=0.2566, pruned_loss=0.03196, over 10302.00 frames. ], tot_loss[loss=0.1603, simple_loss=0.2503, pruned_loss=0.03512, over 2366437.99 frames. ], batch size: 68, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:55:39,838 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.787e+02 2.725e+02 3.133e+02 3.598e+02 6.227e+02, threshold=6.266e+02, percent-clipped=1.0 2023-05-19 01:55:40,974 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.65 vs. limit=2.0 2023-05-19 01:56:00,103 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=335835.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:56:06,977 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=335845.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:56:08,899 INFO [finetune.py:992] (1/2) Epoch 20, batch 3400, loss[loss=0.159, simple_loss=0.2505, pruned_loss=0.03377, over 12356.00 frames. ], tot_loss[loss=0.1601, simple_loss=0.2499, pruned_loss=0.0352, over 2367618.27 frames. ], batch size: 35, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:56:14,078 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.2575, 5.0622, 5.2109, 5.2351, 4.8505, 4.9338, 4.6362, 5.1365], device='cuda:1'), covar=tensor([0.0723, 0.0701, 0.0865, 0.0598, 0.2038, 0.1400, 0.0617, 0.1185], device='cuda:1'), in_proj_covar=tensor([0.0579, 0.0755, 0.0656, 0.0668, 0.0907, 0.0791, 0.0601, 0.0513], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0003, 0.0003], device='cuda:1') 2023-05-19 01:56:40,355 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=335893.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:56:42,600 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=335896.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 01:56:43,804 INFO [finetune.py:992] (1/2) Epoch 20, batch 3450, loss[loss=0.1751, simple_loss=0.277, pruned_loss=0.03661, over 12191.00 frames. ], tot_loss[loss=0.1598, simple_loss=0.2496, pruned_loss=0.03501, over 2369700.02 frames. ], batch size: 35, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:56:50,223 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.551e+02 2.981e+02 3.493e+02 8.017e+02, threshold=5.963e+02, percent-clipped=1.0 2023-05-19 01:56:57,543 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.9079, 5.8997, 5.6850, 5.1419, 5.0893, 5.8128, 5.4327, 5.1456], device='cuda:1'), covar=tensor([0.0875, 0.1025, 0.0754, 0.1931, 0.0862, 0.0828, 0.1766, 0.1063], device='cuda:1'), in_proj_covar=tensor([0.0669, 0.0600, 0.0553, 0.0684, 0.0450, 0.0775, 0.0824, 0.0600], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0004, 0.0004, 0.0003], device='cuda:1') 2023-05-19 01:57:03,271 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.1554, 2.1042, 2.3748, 2.2535, 2.3290, 2.4466, 1.9914, 2.4205], device='cuda:1'), covar=tensor([0.0140, 0.0309, 0.0180, 0.0217, 0.0205, 0.0190, 0.0307, 0.0202], device='cuda:1'), in_proj_covar=tensor([0.0194, 0.0213, 0.0202, 0.0198, 0.0230, 0.0177, 0.0208, 0.0204], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:57:17,466 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.89 vs. limit=2.0 2023-05-19 01:57:18,569 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.1063, 4.9303, 4.8968, 4.8557, 4.6106, 5.0060, 5.1207, 5.2644], device='cuda:1'), covar=tensor([0.0256, 0.0192, 0.0211, 0.0458, 0.0790, 0.0425, 0.0181, 0.0196], device='cuda:1'), in_proj_covar=tensor([0.0206, 0.0207, 0.0201, 0.0258, 0.0248, 0.0231, 0.0186, 0.0243], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:1') 2023-05-19 01:57:19,793 INFO [finetune.py:992] (1/2) Epoch 20, batch 3500, loss[loss=0.1399, simple_loss=0.2306, pruned_loss=0.02463, over 12114.00 frames. ], tot_loss[loss=0.1595, simple_loss=0.2495, pruned_loss=0.03478, over 2361899.17 frames. ], batch size: 33, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:57:19,994 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.1287, 4.7250, 4.7125, 4.9811, 4.7479, 4.9728, 4.8689, 2.7836], device='cuda:1'), covar=tensor([0.0088, 0.0080, 0.0109, 0.0060, 0.0064, 0.0114, 0.0089, 0.0876], device='cuda:1'), in_proj_covar=tensor([0.0074, 0.0086, 0.0090, 0.0078, 0.0065, 0.0100, 0.0087, 0.0104], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 01:57:50,119 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.39 vs. limit=2.0 2023-05-19 01:57:55,094 INFO [finetune.py:992] (1/2) Epoch 20, batch 3550, loss[loss=0.1635, simple_loss=0.25, pruned_loss=0.03851, over 12133.00 frames. ], tot_loss[loss=0.1603, simple_loss=0.2502, pruned_loss=0.03523, over 2363263.50 frames. ], batch size: 38, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:58:04,242 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.896e+02 2.683e+02 3.263e+02 3.799e+02 1.675e+03, threshold=6.525e+02, percent-clipped=3.0 2023-05-19 01:58:09,671 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2023-05-19 01:58:10,707 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.6585, 4.4360, 4.4748, 4.4287, 4.1420, 4.6166, 4.6578, 4.7511], device='cuda:1'), covar=tensor([0.0267, 0.0203, 0.0216, 0.0447, 0.0790, 0.0379, 0.0189, 0.0224], device='cuda:1'), in_proj_covar=tensor([0.0206, 0.0207, 0.0201, 0.0258, 0.0248, 0.0231, 0.0187, 0.0243], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:1') 2023-05-19 01:58:32,378 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.2942, 4.8222, 4.2477, 5.0690, 4.5210, 3.1449, 4.2330, 3.1581], device='cuda:1'), covar=tensor([0.0869, 0.0710, 0.1507, 0.0536, 0.1222, 0.1658, 0.1176, 0.3363], device='cuda:1'), in_proj_covar=tensor([0.0324, 0.0393, 0.0376, 0.0355, 0.0386, 0.0288, 0.0362, 0.0381], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 01:58:32,816 INFO [finetune.py:992] (1/2) Epoch 20, batch 3600, loss[loss=0.132, simple_loss=0.2165, pruned_loss=0.02374, over 12186.00 frames. ], tot_loss[loss=0.1603, simple_loss=0.2501, pruned_loss=0.03526, over 2368494.95 frames. ], batch size: 31, lr: 3.10e-03, grad_scale: 8.0 2023-05-19 01:59:08,304 INFO [finetune.py:992] (1/2) Epoch 20, batch 3650, loss[loss=0.1622, simple_loss=0.2548, pruned_loss=0.0348, over 11203.00 frames. ], tot_loss[loss=0.1598, simple_loss=0.2495, pruned_loss=0.03506, over 2372251.63 frames. ], batch size: 55, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 01:59:14,610 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.912e+02 2.720e+02 3.014e+02 3.515e+02 1.415e+03, threshold=6.028e+02, percent-clipped=3.0 2023-05-19 01:59:21,767 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.0992, 4.7262, 4.7199, 4.9578, 4.7706, 5.0190, 4.8409, 2.7720], device='cuda:1'), covar=tensor([0.0095, 0.0087, 0.0116, 0.0063, 0.0057, 0.0097, 0.0098, 0.0869], device='cuda:1'), in_proj_covar=tensor([0.0074, 0.0086, 0.0090, 0.0079, 0.0065, 0.0100, 0.0087, 0.0105], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 01:59:24,850 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.86 vs. limit=2.0 2023-05-19 01:59:43,829 INFO [finetune.py:992] (1/2) Epoch 20, batch 3700, loss[loss=0.1658, simple_loss=0.262, pruned_loss=0.03475, over 12361.00 frames. ], tot_loss[loss=0.1598, simple_loss=0.2498, pruned_loss=0.03494, over 2368658.99 frames. ], batch size: 36, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 01:59:51,183 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.5214, 2.7346, 3.6445, 4.4677, 3.9391, 4.5461, 3.8524, 3.0990], device='cuda:1'), covar=tensor([0.0039, 0.0388, 0.0149, 0.0047, 0.0117, 0.0077, 0.0158, 0.0405], device='cuda:1'), in_proj_covar=tensor([0.0092, 0.0125, 0.0106, 0.0084, 0.0107, 0.0119, 0.0106, 0.0140], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 02:00:14,027 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=336191.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:00:18,880 INFO [finetune.py:992] (1/2) Epoch 20, batch 3750, loss[loss=0.1743, simple_loss=0.2617, pruned_loss=0.04347, over 12036.00 frames. ], tot_loss[loss=0.1597, simple_loss=0.2499, pruned_loss=0.03481, over 2378122.36 frames. ], batch size: 40, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 02:00:23,898 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=3.30 vs. limit=5.0 2023-05-19 02:00:25,517 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.958e+02 2.588e+02 2.947e+02 3.372e+02 5.172e+02, threshold=5.894e+02, percent-clipped=0.0 2023-05-19 02:00:54,649 INFO [finetune.py:992] (1/2) Epoch 20, batch 3800, loss[loss=0.1667, simple_loss=0.2612, pruned_loss=0.03611, over 10540.00 frames. ], tot_loss[loss=0.1597, simple_loss=0.2498, pruned_loss=0.03479, over 2374511.89 frames. ], batch size: 68, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 02:01:09,139 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.9001, 4.5504, 4.4620, 4.7454, 4.7604, 4.8202, 4.6454, 2.5576], device='cuda:1'), covar=tensor([0.0169, 0.0139, 0.0214, 0.0123, 0.0080, 0.0186, 0.0194, 0.1131], device='cuda:1'), in_proj_covar=tensor([0.0075, 0.0087, 0.0091, 0.0079, 0.0066, 0.0100, 0.0088, 0.0106], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 02:01:29,905 INFO [finetune.py:992] (1/2) Epoch 20, batch 3850, loss[loss=0.1469, simple_loss=0.2312, pruned_loss=0.03128, over 12186.00 frames. ], tot_loss[loss=0.1599, simple_loss=0.2499, pruned_loss=0.0349, over 2378423.26 frames. ], batch size: 29, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 02:01:35,903 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.015e+02 2.561e+02 2.934e+02 3.404e+02 8.064e+02, threshold=5.868e+02, percent-clipped=2.0 2023-05-19 02:01:38,292 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.4042, 5.1821, 5.3512, 5.3856, 5.0212, 5.0827, 4.7491, 5.3154], device='cuda:1'), covar=tensor([0.0689, 0.0682, 0.0862, 0.0580, 0.2049, 0.1321, 0.0590, 0.1128], device='cuda:1'), in_proj_covar=tensor([0.0584, 0.0761, 0.0658, 0.0669, 0.0916, 0.0799, 0.0605, 0.0517], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:1') 2023-05-19 02:02:00,922 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2023-05-19 02:02:04,521 INFO [finetune.py:992] (1/2) Epoch 20, batch 3900, loss[loss=0.1433, simple_loss=0.2349, pruned_loss=0.02592, over 12194.00 frames. ], tot_loss[loss=0.16, simple_loss=0.2502, pruned_loss=0.03489, over 2376451.47 frames. ], batch size: 31, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 02:02:39,988 INFO [finetune.py:992] (1/2) Epoch 20, batch 3950, loss[loss=0.1779, simple_loss=0.2644, pruned_loss=0.04569, over 12258.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.2514, pruned_loss=0.03543, over 2374606.70 frames. ], batch size: 32, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 02:02:44,528 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=336404.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:02:45,303 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.3014, 5.0216, 5.1354, 5.2284, 4.9848, 5.2544, 5.0975, 2.9344], device='cuda:1'), covar=tensor([0.0079, 0.0065, 0.0069, 0.0048, 0.0047, 0.0086, 0.0068, 0.0727], device='cuda:1'), in_proj_covar=tensor([0.0075, 0.0087, 0.0091, 0.0079, 0.0066, 0.0101, 0.0088, 0.0106], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 02:02:46,574 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.980e+02 2.690e+02 3.132e+02 3.710e+02 7.382e+02, threshold=6.264e+02, percent-clipped=1.0 2023-05-19 02:03:14,992 INFO [finetune.py:992] (1/2) Epoch 20, batch 4000, loss[loss=0.1763, simple_loss=0.267, pruned_loss=0.04277, over 12117.00 frames. ], tot_loss[loss=0.1606, simple_loss=0.2507, pruned_loss=0.03526, over 2371593.87 frames. ], batch size: 39, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 02:03:27,141 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=336465.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:03:32,645 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.9619, 6.0193, 5.7345, 5.2162, 5.2178, 5.8621, 5.5196, 5.2686], device='cuda:1'), covar=tensor([0.0965, 0.0964, 0.0753, 0.1768, 0.0733, 0.0901, 0.1717, 0.1061], device='cuda:1'), in_proj_covar=tensor([0.0672, 0.0602, 0.0554, 0.0686, 0.0452, 0.0779, 0.0831, 0.0602], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0004, 0.0004, 0.0003], device='cuda:1') 2023-05-19 02:03:45,125 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=336491.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:03:49,726 INFO [finetune.py:992] (1/2) Epoch 20, batch 4050, loss[loss=0.1621, simple_loss=0.2552, pruned_loss=0.03449, over 12038.00 frames. ], tot_loss[loss=0.1607, simple_loss=0.2509, pruned_loss=0.03532, over 2372992.20 frames. ], batch size: 42, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 02:03:55,579 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.8080, 2.7877, 4.3663, 4.6291, 2.8852, 2.6360, 2.9485, 2.2220], device='cuda:1'), covar=tensor([0.1664, 0.3110, 0.0569, 0.0434, 0.1393, 0.2679, 0.2853, 0.4207], device='cuda:1'), in_proj_covar=tensor([0.0318, 0.0401, 0.0287, 0.0316, 0.0288, 0.0333, 0.0415, 0.0389], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-19 02:03:56,643 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.699e+02 2.492e+02 2.845e+02 3.416e+02 7.889e+02, threshold=5.690e+02, percent-clipped=2.0 2023-05-19 02:04:09,764 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.1161, 2.4717, 3.6982, 3.0449, 3.4323, 3.2154, 2.6090, 3.5492], device='cuda:1'), covar=tensor([0.0185, 0.0422, 0.0157, 0.0309, 0.0171, 0.0201, 0.0420, 0.0154], device='cuda:1'), in_proj_covar=tensor([0.0196, 0.0216, 0.0205, 0.0201, 0.0233, 0.0178, 0.0211, 0.0206], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 02:04:13,599 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.39 vs. limit=2.0 2023-05-19 02:04:19,327 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=336539.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:04:25,526 INFO [finetune.py:992] (1/2) Epoch 20, batch 4100, loss[loss=0.1358, simple_loss=0.2221, pruned_loss=0.02478, over 12273.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.2512, pruned_loss=0.03543, over 2370043.15 frames. ], batch size: 28, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 02:05:00,052 INFO [finetune.py:992] (1/2) Epoch 20, batch 4150, loss[loss=0.1569, simple_loss=0.2506, pruned_loss=0.0316, over 12296.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.2521, pruned_loss=0.03566, over 2368095.52 frames. ], batch size: 33, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:05:06,641 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.661e+02 2.584e+02 3.123e+02 3.691e+02 5.062e+02, threshold=6.246e+02, percent-clipped=0.0 2023-05-19 02:05:12,850 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.2170, 6.1332, 5.7977, 5.6756, 6.1940, 5.5417, 5.6760, 5.6968], device='cuda:1'), covar=tensor([0.1396, 0.0770, 0.0986, 0.1657, 0.0686, 0.2111, 0.1705, 0.1190], device='cuda:1'), in_proj_covar=tensor([0.0375, 0.0521, 0.0422, 0.0466, 0.0477, 0.0460, 0.0416, 0.0401], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 02:05:35,524 INFO [finetune.py:992] (1/2) Epoch 20, batch 4200, loss[loss=0.1839, simple_loss=0.2763, pruned_loss=0.04571, over 12357.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.2522, pruned_loss=0.0356, over 2376197.37 frames. ], batch size: 35, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:06:10,477 INFO [finetune.py:992] (1/2) Epoch 20, batch 4250, loss[loss=0.1761, simple_loss=0.2667, pruned_loss=0.04273, over 12115.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.2521, pruned_loss=0.03559, over 2373766.72 frames. ], batch size: 38, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:06:16,574 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.796e+02 2.659e+02 3.290e+02 3.911e+02 6.894e+02, threshold=6.580e+02, percent-clipped=3.0 2023-05-19 02:06:17,500 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=336708.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:06:18,954 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.1166, 3.9499, 2.6915, 2.3944, 3.5434, 2.4917, 3.5466, 2.9769], device='cuda:1'), covar=tensor([0.0741, 0.0696, 0.1140, 0.1648, 0.0327, 0.1407, 0.0635, 0.0875], device='cuda:1'), in_proj_covar=tensor([0.0194, 0.0269, 0.0184, 0.0209, 0.0150, 0.0190, 0.0207, 0.0182], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 02:06:44,830 INFO [finetune.py:992] (1/2) Epoch 20, batch 4300, loss[loss=0.1604, simple_loss=0.2459, pruned_loss=0.03745, over 12297.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2525, pruned_loss=0.036, over 2360780.11 frames. ], batch size: 33, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:06:53,305 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=336760.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:06:57,628 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.2754, 2.7802, 3.8769, 3.2173, 3.6686, 3.3522, 2.7886, 3.7199], device='cuda:1'), covar=tensor([0.0191, 0.0424, 0.0171, 0.0309, 0.0157, 0.0245, 0.0468, 0.0157], device='cuda:1'), in_proj_covar=tensor([0.0196, 0.0216, 0.0205, 0.0200, 0.0232, 0.0178, 0.0210, 0.0205], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 02:06:59,719 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=336769.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:07:20,131 INFO [finetune.py:992] (1/2) Epoch 20, batch 4350, loss[loss=0.2142, simple_loss=0.2888, pruned_loss=0.06978, over 7784.00 frames. ], tot_loss[loss=0.1615, simple_loss=0.2515, pruned_loss=0.03578, over 2366440.06 frames. ], batch size: 98, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:07:26,569 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.803e+02 2.528e+02 2.921e+02 3.513e+02 8.768e+02, threshold=5.842e+02, percent-clipped=3.0 2023-05-19 02:07:52,110 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.69 vs. limit=2.0 2023-05-19 02:07:55,853 INFO [finetune.py:992] (1/2) Epoch 20, batch 4400, loss[loss=0.1877, simple_loss=0.2761, pruned_loss=0.04962, over 12114.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2526, pruned_loss=0.03627, over 2357734.81 frames. ], batch size: 39, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:08:00,388 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.17 vs. limit=2.0 2023-05-19 02:08:30,667 INFO [finetune.py:992] (1/2) Epoch 20, batch 4450, loss[loss=0.1603, simple_loss=0.2581, pruned_loss=0.03127, over 12346.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.2513, pruned_loss=0.03556, over 2361992.41 frames. ], batch size: 35, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:08:30,876 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=336898.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:08:36,940 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.830e+02 2.654e+02 2.944e+02 3.497e+02 1.198e+03, threshold=5.888e+02, percent-clipped=2.0 2023-05-19 02:09:05,761 INFO [finetune.py:992] (1/2) Epoch 20, batch 4500, loss[loss=0.1459, simple_loss=0.2274, pruned_loss=0.03224, over 12351.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.2516, pruned_loss=0.03529, over 2370735.20 frames. ], batch size: 30, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:09:13,601 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=336959.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:09:40,729 INFO [finetune.py:992] (1/2) Epoch 20, batch 4550, loss[loss=0.1648, simple_loss=0.2554, pruned_loss=0.03709, over 12024.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.2514, pruned_loss=0.03553, over 2365141.06 frames. ], batch size: 42, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:09:46,975 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 2.592e+02 3.019e+02 3.499e+02 5.895e+02, threshold=6.039e+02, percent-clipped=1.0 2023-05-19 02:10:15,102 INFO [finetune.py:992] (1/2) Epoch 20, batch 4600, loss[loss=0.1351, simple_loss=0.2139, pruned_loss=0.02816, over 12181.00 frames. ], tot_loss[loss=0.1608, simple_loss=0.2507, pruned_loss=0.03545, over 2372633.36 frames. ], batch size: 29, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:10:22,929 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.5325, 4.9223, 3.2775, 3.0133, 4.3006, 2.8728, 4.2075, 3.5318], device='cuda:1'), covar=tensor([0.0668, 0.0633, 0.1096, 0.1445, 0.0275, 0.1320, 0.0525, 0.0782], device='cuda:1'), in_proj_covar=tensor([0.0192, 0.0267, 0.0181, 0.0207, 0.0149, 0.0189, 0.0205, 0.0181], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 02:10:23,554 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=337060.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:10:25,645 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.1399, 3.9272, 4.0533, 4.3837, 2.8121, 3.8868, 2.4450, 3.9640], device='cuda:1'), covar=tensor([0.1955, 0.0903, 0.0842, 0.0572, 0.1508, 0.0785, 0.2297, 0.0955], device='cuda:1'), in_proj_covar=tensor([0.0237, 0.0279, 0.0305, 0.0374, 0.0249, 0.0253, 0.0270, 0.0379], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-19 02:10:26,209 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=337064.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:10:50,046 INFO [finetune.py:992] (1/2) Epoch 20, batch 4650, loss[loss=0.1363, simple_loss=0.2175, pruned_loss=0.02758, over 11830.00 frames. ], tot_loss[loss=0.1607, simple_loss=0.2506, pruned_loss=0.0354, over 2368751.61 frames. ], batch size: 26, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:10:56,356 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.912e+02 2.687e+02 2.925e+02 3.531e+02 6.038e+02, threshold=5.850e+02, percent-clipped=0.0 2023-05-19 02:10:57,100 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=337108.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:11:24,976 INFO [finetune.py:992] (1/2) Epoch 20, batch 4700, loss[loss=0.169, simple_loss=0.2573, pruned_loss=0.04041, over 12059.00 frames. ], tot_loss[loss=0.1608, simple_loss=0.2507, pruned_loss=0.03542, over 2373041.11 frames. ], batch size: 37, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:11:29,984 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.8635, 3.7981, 3.3515, 3.2776, 3.0304, 2.9715, 3.8686, 2.5267], device='cuda:1'), covar=tensor([0.0397, 0.0171, 0.0237, 0.0235, 0.0459, 0.0432, 0.0143, 0.0529], device='cuda:1'), in_proj_covar=tensor([0.0200, 0.0170, 0.0176, 0.0200, 0.0208, 0.0206, 0.0182, 0.0211], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 02:11:43,321 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.1241, 2.2875, 2.7488, 3.1356, 2.3002, 3.2005, 3.1719, 3.2971], device='cuda:1'), covar=tensor([0.0208, 0.1151, 0.0546, 0.0226, 0.1114, 0.0381, 0.0368, 0.0168], device='cuda:1'), in_proj_covar=tensor([0.0128, 0.0210, 0.0188, 0.0128, 0.0193, 0.0187, 0.0186, 0.0130], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:1') 2023-05-19 02:11:51,756 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.16 vs. limit=2.0 2023-05-19 02:11:59,479 INFO [finetune.py:992] (1/2) Epoch 20, batch 4750, loss[loss=0.1614, simple_loss=0.2529, pruned_loss=0.0349, over 12309.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.2516, pruned_loss=0.03584, over 2371061.29 frames. ], batch size: 34, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:12:06,131 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.728e+02 2.777e+02 3.175e+02 3.703e+02 5.644e+02, threshold=6.351e+02, percent-clipped=0.0 2023-05-19 02:12:16,181 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.0437, 4.8879, 4.9112, 4.8807, 4.4911, 4.9840, 5.0372, 5.1475], device='cuda:1'), covar=tensor([0.0272, 0.0185, 0.0188, 0.0441, 0.0923, 0.0395, 0.0177, 0.0209], device='cuda:1'), in_proj_covar=tensor([0.0210, 0.0210, 0.0204, 0.0262, 0.0252, 0.0236, 0.0189, 0.0247], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:1') 2023-05-19 02:12:17,629 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.0260, 4.5119, 3.9396, 4.8040, 4.2960, 2.8125, 4.0980, 2.8374], device='cuda:1'), covar=tensor([0.0982, 0.0818, 0.1596, 0.0577, 0.1389, 0.1903, 0.1150, 0.3749], device='cuda:1'), in_proj_covar=tensor([0.0322, 0.0389, 0.0372, 0.0353, 0.0383, 0.0285, 0.0360, 0.0379], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 02:12:22,634 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-05-19 02:12:35,565 INFO [finetune.py:992] (1/2) Epoch 20, batch 4800, loss[loss=0.133, simple_loss=0.2198, pruned_loss=0.02314, over 11800.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.252, pruned_loss=0.03609, over 2365371.40 frames. ], batch size: 26, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:12:36,777 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.39 vs. limit=2.0 2023-05-19 02:12:39,896 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=337254.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:12:56,792 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.12 vs. limit=2.0 2023-05-19 02:13:11,042 INFO [finetune.py:992] (1/2) Epoch 20, batch 4850, loss[loss=0.1699, simple_loss=0.2707, pruned_loss=0.03457, over 12144.00 frames. ], tot_loss[loss=0.1613, simple_loss=0.2513, pruned_loss=0.03564, over 2373317.76 frames. ], batch size: 36, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:13:11,160 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=337298.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:13:17,210 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.759e+02 2.560e+02 3.113e+02 3.760e+02 7.486e+02, threshold=6.227e+02, percent-clipped=4.0 2023-05-19 02:13:24,221 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.9289, 4.6518, 4.6446, 4.8154, 4.6319, 4.8994, 4.7385, 2.7866], device='cuda:1'), covar=tensor([0.0094, 0.0076, 0.0107, 0.0067, 0.0055, 0.0094, 0.0085, 0.0839], device='cuda:1'), in_proj_covar=tensor([0.0075, 0.0087, 0.0091, 0.0080, 0.0067, 0.0101, 0.0089, 0.0106], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 02:13:45,754 INFO [finetune.py:992] (1/2) Epoch 20, batch 4900, loss[loss=0.1723, simple_loss=0.2567, pruned_loss=0.04393, over 12361.00 frames. ], tot_loss[loss=0.1608, simple_loss=0.251, pruned_loss=0.0353, over 2380792.22 frames. ], batch size: 31, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:13:53,548 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=337359.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:13:56,866 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=337364.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:14:20,458 INFO [finetune.py:992] (1/2) Epoch 20, batch 4950, loss[loss=0.1581, simple_loss=0.2467, pruned_loss=0.03474, over 12097.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2518, pruned_loss=0.03544, over 2377596.47 frames. ], batch size: 32, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:14:26,846 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.954e+02 2.693e+02 3.241e+02 3.801e+02 7.006e+02, threshold=6.482e+02, percent-clipped=2.0 2023-05-19 02:14:30,976 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=337412.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:14:31,872 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.5098, 4.9055, 3.1808, 3.1093, 4.2405, 2.8663, 4.2214, 3.5273], device='cuda:1'), covar=tensor([0.0697, 0.0459, 0.1124, 0.1244, 0.0331, 0.1258, 0.0451, 0.0749], device='cuda:1'), in_proj_covar=tensor([0.0192, 0.0268, 0.0181, 0.0206, 0.0149, 0.0189, 0.0205, 0.0180], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 02:14:38,870 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.11 vs. limit=2.0 2023-05-19 02:14:39,412 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=337424.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:14:55,658 INFO [finetune.py:992] (1/2) Epoch 20, batch 5000, loss[loss=0.149, simple_loss=0.2481, pruned_loss=0.02491, over 12190.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.2516, pruned_loss=0.03536, over 2384790.74 frames. ], batch size: 35, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:15:21,272 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=337485.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 02:15:26,743 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.7082, 3.7255, 3.3639, 3.1871, 3.0151, 2.8935, 3.7732, 2.5247], device='cuda:1'), covar=tensor([0.0426, 0.0172, 0.0228, 0.0253, 0.0465, 0.0429, 0.0160, 0.0538], device='cuda:1'), in_proj_covar=tensor([0.0200, 0.0170, 0.0176, 0.0200, 0.0208, 0.0206, 0.0183, 0.0211], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 02:15:29,960 INFO [finetune.py:992] (1/2) Epoch 20, batch 5050, loss[loss=0.2363, simple_loss=0.3034, pruned_loss=0.0846, over 8023.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.2518, pruned_loss=0.03593, over 2366150.01 frames. ], batch size: 97, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:15:36,370 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.854e+02 2.518e+02 2.863e+02 3.518e+02 6.980e+02, threshold=5.726e+02, percent-clipped=2.0 2023-05-19 02:16:06,006 INFO [finetune.py:992] (1/2) Epoch 20, batch 5100, loss[loss=0.1546, simple_loss=0.2499, pruned_loss=0.02965, over 12361.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2512, pruned_loss=0.03582, over 2358752.18 frames. ], batch size: 35, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:16:10,344 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=337554.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:16:33,173 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=337587.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:16:40,683 INFO [finetune.py:992] (1/2) Epoch 20, batch 5150, loss[loss=0.1498, simple_loss=0.2324, pruned_loss=0.03364, over 12180.00 frames. ], tot_loss[loss=0.1605, simple_loss=0.2502, pruned_loss=0.03547, over 2370428.93 frames. ], batch size: 31, lr: 3.09e-03, grad_scale: 16.0 2023-05-19 02:16:40,869 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=337598.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:16:43,757 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=337602.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:16:47,081 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.885e+02 2.559e+02 2.941e+02 3.528e+02 7.913e+02, threshold=5.882e+02, percent-clipped=2.0 2023-05-19 02:17:15,273 INFO [finetune.py:992] (1/2) Epoch 20, batch 5200, loss[loss=0.1564, simple_loss=0.2485, pruned_loss=0.03211, over 11577.00 frames. ], tot_loss[loss=0.1609, simple_loss=0.2506, pruned_loss=0.03561, over 2365307.90 frames. ], batch size: 48, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 02:17:15,483 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=337648.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:17:18,204 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.0476, 4.6656, 4.7426, 4.9941, 4.7004, 5.0173, 4.8738, 2.4169], device='cuda:1'), covar=tensor([0.0101, 0.0092, 0.0113, 0.0060, 0.0057, 0.0098, 0.0087, 0.1024], device='cuda:1'), in_proj_covar=tensor([0.0076, 0.0087, 0.0092, 0.0080, 0.0067, 0.0102, 0.0089, 0.0106], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 02:17:18,454 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=2.90 vs. limit=5.0 2023-05-19 02:17:19,452 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=337654.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:17:23,733 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=337659.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:17:49,353 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.7087, 2.8609, 4.6188, 4.7066, 2.7849, 2.6330, 3.0379, 2.3148], device='cuda:1'), covar=tensor([0.1827, 0.3107, 0.0491, 0.0462, 0.1499, 0.2698, 0.2820, 0.3997], device='cuda:1'), in_proj_covar=tensor([0.0318, 0.0401, 0.0287, 0.0315, 0.0288, 0.0331, 0.0414, 0.0387], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-19 02:17:51,182 INFO [finetune.py:992] (1/2) Epoch 20, batch 5250, loss[loss=0.131, simple_loss=0.2187, pruned_loss=0.02161, over 12291.00 frames. ], tot_loss[loss=0.1601, simple_loss=0.25, pruned_loss=0.03512, over 2372262.15 frames. ], batch size: 28, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 02:17:58,059 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.650e+02 2.634e+02 3.132e+02 3.946e+02 9.591e+02, threshold=6.265e+02, percent-clipped=4.0 2023-05-19 02:18:18,186 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.38 vs. limit=2.0 2023-05-19 02:18:25,738 INFO [finetune.py:992] (1/2) Epoch 20, batch 5300, loss[loss=0.1597, simple_loss=0.2518, pruned_loss=0.03384, over 12384.00 frames. ], tot_loss[loss=0.1597, simple_loss=0.2496, pruned_loss=0.03496, over 2365579.01 frames. ], batch size: 38, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 02:18:48,141 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=337780.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 02:19:00,749 INFO [finetune.py:992] (1/2) Epoch 20, batch 5350, loss[loss=0.1569, simple_loss=0.2581, pruned_loss=0.02785, over 12358.00 frames. ], tot_loss[loss=0.1603, simple_loss=0.2507, pruned_loss=0.03491, over 2373760.05 frames. ], batch size: 35, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 02:19:08,832 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.842e+02 2.541e+02 2.870e+02 3.521e+02 6.072e+02, threshold=5.739e+02, percent-clipped=0.0 2023-05-19 02:19:37,041 INFO [finetune.py:992] (1/2) Epoch 20, batch 5400, loss[loss=0.1458, simple_loss=0.235, pruned_loss=0.02832, over 12290.00 frames. ], tot_loss[loss=0.161, simple_loss=0.2515, pruned_loss=0.03531, over 2372070.61 frames. ], batch size: 33, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 02:19:39,982 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=337852.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:20:11,504 INFO [finetune.py:992] (1/2) Epoch 20, batch 5450, loss[loss=0.1333, simple_loss=0.2194, pruned_loss=0.02362, over 12167.00 frames. ], tot_loss[loss=0.161, simple_loss=0.2515, pruned_loss=0.03531, over 2372226.13 frames. ], batch size: 29, lr: 3.09e-03, grad_scale: 8.0 2023-05-19 02:20:14,338 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=337902.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:20:18,429 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.886e+02 2.713e+02 3.124e+02 3.875e+02 8.180e+02, threshold=6.247e+02, percent-clipped=4.0 2023-05-19 02:20:22,094 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=337913.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:20:43,171 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=337943.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:20:46,510 INFO [finetune.py:992] (1/2) Epoch 20, batch 5500, loss[loss=0.1696, simple_loss=0.2684, pruned_loss=0.03545, over 12146.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.2515, pruned_loss=0.03532, over 2370138.72 frames. ], batch size: 36, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:20:50,702 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=337954.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:20:50,780 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=337954.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:20:57,791 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=337963.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:21:05,587 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.4421, 4.7228, 2.9921, 2.8247, 4.0913, 2.6688, 3.9707, 3.3038], device='cuda:1'), covar=tensor([0.0722, 0.0557, 0.1102, 0.1494, 0.0339, 0.1331, 0.0524, 0.0862], device='cuda:1'), in_proj_covar=tensor([0.0193, 0.0269, 0.0182, 0.0208, 0.0150, 0.0190, 0.0207, 0.0182], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 02:21:22,349 INFO [finetune.py:992] (1/2) Epoch 20, batch 5550, loss[loss=0.1515, simple_loss=0.2363, pruned_loss=0.03333, over 12184.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.2512, pruned_loss=0.03551, over 2372022.39 frames. ], batch size: 31, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:21:28,297 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=338002.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:21:32,379 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.793e+02 2.574e+02 2.959e+02 3.514e+02 7.470e+02, threshold=5.917e+02, percent-clipped=1.0 2023-05-19 02:21:58,261 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.3843, 2.4869, 3.0646, 4.1771, 2.5213, 4.2913, 4.3322, 4.3539], device='cuda:1'), covar=tensor([0.0153, 0.1452, 0.0604, 0.0196, 0.1360, 0.0286, 0.0198, 0.0130], device='cuda:1'), in_proj_covar=tensor([0.0129, 0.0212, 0.0189, 0.0129, 0.0194, 0.0188, 0.0188, 0.0131], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:1') 2023-05-19 02:21:58,961 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=338046.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:22:00,137 INFO [finetune.py:992] (1/2) Epoch 20, batch 5600, loss[loss=0.1465, simple_loss=0.2393, pruned_loss=0.02682, over 12175.00 frames. ], tot_loss[loss=0.161, simple_loss=0.2511, pruned_loss=0.03546, over 2376083.20 frames. ], batch size: 31, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:22:22,137 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=338080.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 02:22:22,150 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=338080.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:22:24,135 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.2518, 5.0654, 5.0514, 5.1056, 4.7520, 5.1890, 5.2274, 5.3153], device='cuda:1'), covar=tensor([0.0198, 0.0169, 0.0183, 0.0308, 0.0753, 0.0328, 0.0149, 0.0186], device='cuda:1'), in_proj_covar=tensor([0.0212, 0.0211, 0.0206, 0.0266, 0.0254, 0.0238, 0.0192, 0.0250], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0005, 0.0003, 0.0005], device='cuda:1') 2023-05-19 02:22:35,037 INFO [finetune.py:992] (1/2) Epoch 20, batch 5650, loss[loss=0.1844, simple_loss=0.2672, pruned_loss=0.05083, over 11593.00 frames. ], tot_loss[loss=0.161, simple_loss=0.2514, pruned_loss=0.03534, over 2375896.17 frames. ], batch size: 48, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:22:41,568 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=338107.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:22:42,060 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.795e+02 2.626e+02 3.148e+02 3.707e+02 9.372e+02, threshold=6.297e+02, percent-clipped=1.0 2023-05-19 02:22:43,686 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.5311, 5.4917, 5.3081, 4.9185, 4.9135, 5.4402, 5.1104, 4.8485], device='cuda:1'), covar=tensor([0.0821, 0.0909, 0.0741, 0.1607, 0.1019, 0.0807, 0.1495, 0.0988], device='cuda:1'), in_proj_covar=tensor([0.0665, 0.0598, 0.0553, 0.0680, 0.0451, 0.0781, 0.0825, 0.0595], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0004, 0.0004, 0.0002], device='cuda:1') 2023-05-19 02:22:45,852 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.0817, 4.9451, 4.9104, 4.9589, 4.6413, 5.0983, 5.0969, 5.2707], device='cuda:1'), covar=tensor([0.0319, 0.0180, 0.0206, 0.0333, 0.0774, 0.0333, 0.0193, 0.0193], device='cuda:1'), in_proj_covar=tensor([0.0211, 0.0211, 0.0205, 0.0265, 0.0253, 0.0238, 0.0192, 0.0249], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0005, 0.0003, 0.0005], device='cuda:1') 2023-05-19 02:22:56,062 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=338128.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:23:05,179 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=338141.0, num_to_drop=1, layers_to_drop={3} 2023-05-19 02:23:07,180 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([6.1444, 6.1313, 5.9291, 5.4144, 5.3356, 6.0291, 5.6982, 5.4089], device='cuda:1'), covar=tensor([0.0688, 0.0924, 0.0691, 0.1652, 0.0666, 0.0730, 0.1393, 0.0973], device='cuda:1'), in_proj_covar=tensor([0.0667, 0.0601, 0.0555, 0.0683, 0.0453, 0.0783, 0.0829, 0.0597], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0004, 0.0004, 0.0002], device='cuda:1') 2023-05-19 02:23:10,530 INFO [finetune.py:992] (1/2) Epoch 20, batch 5700, loss[loss=0.14, simple_loss=0.2283, pruned_loss=0.02589, over 12118.00 frames. ], tot_loss[loss=0.1598, simple_loss=0.2501, pruned_loss=0.03475, over 2383011.30 frames. ], batch size: 30, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:23:16,964 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.0140, 3.9542, 4.0237, 4.0759, 3.8124, 3.8778, 3.6849, 3.9705], device='cuda:1'), covar=tensor([0.1197, 0.0780, 0.1265, 0.0727, 0.1837, 0.1322, 0.0738, 0.1166], device='cuda:1'), in_proj_covar=tensor([0.0576, 0.0754, 0.0660, 0.0670, 0.0910, 0.0790, 0.0604, 0.0513], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0003, 0.0003], device='cuda:1') 2023-05-19 02:23:28,780 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.4768, 5.2567, 5.3733, 5.3986, 5.0227, 5.0802, 4.7561, 5.3371], device='cuda:1'), covar=tensor([0.0715, 0.0607, 0.0876, 0.0649, 0.2020, 0.1521, 0.0621, 0.1087], device='cuda:1'), in_proj_covar=tensor([0.0576, 0.0753, 0.0660, 0.0669, 0.0909, 0.0789, 0.0603, 0.0512], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0003, 0.0003], device='cuda:1') 2023-05-19 02:23:45,267 INFO [finetune.py:992] (1/2) Epoch 20, batch 5750, loss[loss=0.141, simple_loss=0.2236, pruned_loss=0.02915, over 12293.00 frames. ], tot_loss[loss=0.1601, simple_loss=0.2504, pruned_loss=0.0349, over 2383927.90 frames. ], batch size: 28, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:23:52,363 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.434e+02 2.922e+02 3.324e+02 6.563e+02, threshold=5.844e+02, percent-clipped=1.0 2023-05-19 02:23:52,461 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=338208.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:23:53,197 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.5613, 5.4053, 5.5136, 5.5061, 5.1690, 5.2123, 4.8976, 5.4802], device='cuda:1'), covar=tensor([0.0771, 0.0610, 0.0849, 0.0682, 0.2024, 0.1438, 0.0603, 0.1082], device='cuda:1'), in_proj_covar=tensor([0.0577, 0.0752, 0.0660, 0.0670, 0.0908, 0.0789, 0.0603, 0.0513], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0003, 0.0003], device='cuda:1') 2023-05-19 02:24:00,396 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.9674, 4.1590, 3.8365, 4.5694, 3.2116, 4.0898, 2.6772, 4.4039], device='cuda:1'), covar=tensor([0.1053, 0.0604, 0.1307, 0.0765, 0.0986, 0.0522, 0.1642, 0.0927], device='cuda:1'), in_proj_covar=tensor([0.0232, 0.0274, 0.0301, 0.0365, 0.0245, 0.0250, 0.0265, 0.0372], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-19 02:24:17,641 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=338243.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:24:21,009 INFO [finetune.py:992] (1/2) Epoch 20, batch 5800, loss[loss=0.1585, simple_loss=0.2537, pruned_loss=0.0316, over 12046.00 frames. ], tot_loss[loss=0.1603, simple_loss=0.2503, pruned_loss=0.03518, over 2382559.06 frames. ], batch size: 40, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:24:22,764 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=2.26 vs. limit=5.0 2023-05-19 02:24:25,402 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=338254.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:24:28,110 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=338258.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:24:36,612 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.7469, 2.2803, 3.1839, 3.7206, 3.4746, 3.7772, 3.4589, 2.7468], device='cuda:1'), covar=tensor([0.0066, 0.0433, 0.0195, 0.0068, 0.0137, 0.0100, 0.0147, 0.0393], device='cuda:1'), in_proj_covar=tensor([0.0093, 0.0125, 0.0106, 0.0084, 0.0107, 0.0121, 0.0105, 0.0139], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 02:24:51,054 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=338291.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:24:56,406 INFO [finetune.py:992] (1/2) Epoch 20, batch 5850, loss[loss=0.1583, simple_loss=0.2473, pruned_loss=0.0346, over 12257.00 frames. ], tot_loss[loss=0.1608, simple_loss=0.2509, pruned_loss=0.03538, over 2379649.80 frames. ], batch size: 32, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:24:59,224 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=338302.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:25:01,622 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.2036, 4.5134, 3.9660, 4.8623, 4.3953, 2.9623, 4.1622, 2.9393], device='cuda:1'), covar=tensor([0.0877, 0.0865, 0.1658, 0.0613, 0.1318, 0.1697, 0.1138, 0.3697], device='cuda:1'), in_proj_covar=tensor([0.0323, 0.0391, 0.0374, 0.0354, 0.0385, 0.0286, 0.0360, 0.0379], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 02:25:03,453 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.914e+02 2.653e+02 3.094e+02 3.992e+02 1.662e+03, threshold=6.188e+02, percent-clipped=6.0 2023-05-19 02:25:17,426 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.5559, 3.5449, 3.1227, 3.0922, 2.8783, 2.7170, 3.6284, 2.4353], device='cuda:1'), covar=tensor([0.0437, 0.0180, 0.0264, 0.0247, 0.0469, 0.0416, 0.0166, 0.0524], device='cuda:1'), in_proj_covar=tensor([0.0202, 0.0172, 0.0178, 0.0202, 0.0208, 0.0207, 0.0184, 0.0212], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 02:25:22,181 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.0623, 4.8621, 4.8506, 4.8757, 4.5769, 5.0181, 5.0163, 5.1775], device='cuda:1'), covar=tensor([0.0226, 0.0192, 0.0195, 0.0411, 0.0825, 0.0403, 0.0185, 0.0192], device='cuda:1'), in_proj_covar=tensor([0.0211, 0.0211, 0.0205, 0.0265, 0.0253, 0.0238, 0.0192, 0.0249], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0005, 0.0003, 0.0005], device='cuda:1') 2023-05-19 02:25:28,398 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.4051, 4.6542, 3.0554, 2.4555, 4.1551, 2.6162, 4.0685, 3.1840], device='cuda:1'), covar=tensor([0.0646, 0.0518, 0.1098, 0.1752, 0.0276, 0.1396, 0.0425, 0.0870], device='cuda:1'), in_proj_covar=tensor([0.0194, 0.0269, 0.0182, 0.0207, 0.0150, 0.0190, 0.0207, 0.0182], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 02:25:30,904 INFO [finetune.py:992] (1/2) Epoch 20, batch 5900, loss[loss=0.1385, simple_loss=0.2234, pruned_loss=0.02684, over 12184.00 frames. ], tot_loss[loss=0.1608, simple_loss=0.251, pruned_loss=0.03534, over 2382455.31 frames. ], batch size: 31, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:25:43,704 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.0487, 2.3125, 2.9478, 3.8551, 2.2039, 4.0244, 3.9945, 4.0394], device='cuda:1'), covar=tensor([0.0142, 0.1341, 0.0553, 0.0228, 0.1421, 0.0282, 0.0197, 0.0139], device='cuda:1'), in_proj_covar=tensor([0.0129, 0.0211, 0.0189, 0.0130, 0.0192, 0.0188, 0.0187, 0.0131], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:1') 2023-05-19 02:25:46,653 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.82 vs. limit=5.0 2023-05-19 02:25:49,976 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.3777, 4.9092, 5.3726, 4.7064, 5.0822, 4.8431, 5.3911, 5.0160], device='cuda:1'), covar=tensor([0.0316, 0.0436, 0.0291, 0.0284, 0.0427, 0.0319, 0.0223, 0.0345], device='cuda:1'), in_proj_covar=tensor([0.0290, 0.0289, 0.0316, 0.0286, 0.0287, 0.0285, 0.0260, 0.0234], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 02:26:06,137 INFO [finetune.py:992] (1/2) Epoch 20, batch 5950, loss[loss=0.1675, simple_loss=0.2601, pruned_loss=0.03743, over 12291.00 frames. ], tot_loss[loss=0.1608, simple_loss=0.2513, pruned_loss=0.03513, over 2382224.72 frames. ], batch size: 34, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:26:08,872 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.3987, 4.7096, 3.0899, 2.8404, 4.1538, 2.7125, 4.0062, 3.3557], device='cuda:1'), covar=tensor([0.0744, 0.0569, 0.1161, 0.1421, 0.0308, 0.1360, 0.0531, 0.0814], device='cuda:1'), in_proj_covar=tensor([0.0194, 0.0268, 0.0182, 0.0207, 0.0149, 0.0190, 0.0207, 0.0181], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 02:26:09,429 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=338402.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:26:13,532 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.664e+02 2.624e+02 3.116e+02 3.722e+02 7.691e+02, threshold=6.232e+02, percent-clipped=1.0 2023-05-19 02:26:14,490 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=338409.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:26:33,618 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=338436.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 02:26:34,420 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.4035, 2.9189, 2.7495, 2.7336, 2.4388, 2.3976, 2.8843, 2.0997], device='cuda:1'), covar=tensor([0.0392, 0.0214, 0.0315, 0.0277, 0.0483, 0.0378, 0.0233, 0.0531], device='cuda:1'), in_proj_covar=tensor([0.0200, 0.0171, 0.0177, 0.0201, 0.0207, 0.0205, 0.0182, 0.0212], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 02:26:39,337 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.9120, 3.2489, 2.5341, 2.3084, 3.0988, 2.3376, 3.1255, 2.6685], device='cuda:1'), covar=tensor([0.0621, 0.0640, 0.0894, 0.1381, 0.0272, 0.1197, 0.0617, 0.0787], device='cuda:1'), in_proj_covar=tensor([0.0195, 0.0270, 0.0183, 0.0208, 0.0150, 0.0191, 0.0208, 0.0182], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 02:26:41,810 INFO [finetune.py:992] (1/2) Epoch 20, batch 6000, loss[loss=0.1653, simple_loss=0.2616, pruned_loss=0.0345, over 11642.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.2523, pruned_loss=0.03572, over 2368041.33 frames. ], batch size: 48, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:26:41,810 INFO [finetune.py:1017] (1/2) Computing validation loss 2023-05-19 02:26:55,840 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.5821, 4.5748, 4.6482, 4.6590, 4.3047, 4.2421, 4.2316, 4.5710], device='cuda:1'), covar=tensor([0.0905, 0.0581, 0.0769, 0.0566, 0.1806, 0.1704, 0.0637, 0.0942], device='cuda:1'), in_proj_covar=tensor([0.0574, 0.0752, 0.0658, 0.0669, 0.0905, 0.0785, 0.0600, 0.0509], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0003, 0.0003], device='cuda:1') 2023-05-19 02:26:57,874 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.9304, 3.7959, 2.4976, 1.9688, 3.5047, 2.0471, 3.5358, 2.6169], device='cuda:1'), covar=tensor([0.0846, 0.0354, 0.1342, 0.2442, 0.0242, 0.1943, 0.0436, 0.1045], device='cuda:1'), in_proj_covar=tensor([0.0195, 0.0270, 0.0183, 0.0208, 0.0150, 0.0191, 0.0208, 0.0182], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 02:26:59,424 INFO [finetune.py:1026] (1/2) Epoch 20, validation: loss=0.3087, simple_loss=0.3852, pruned_loss=0.116, over 1020973.00 frames. 2023-05-19 02:26:59,425 INFO [finetune.py:1027] (1/2) Maximum memory allocated so far is 12856MB 2023-05-19 02:27:03,240 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=2.83 vs. limit=5.0 2023-05-19 02:27:09,548 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.32 vs. limit=2.0 2023-05-19 02:27:15,497 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=338470.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:27:22,432 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.3747, 5.2178, 5.3509, 5.3541, 5.0192, 4.9935, 4.7731, 5.2837], device='cuda:1'), covar=tensor([0.0748, 0.0592, 0.0794, 0.0596, 0.1933, 0.1403, 0.0631, 0.1081], device='cuda:1'), in_proj_covar=tensor([0.0573, 0.0750, 0.0657, 0.0668, 0.0903, 0.0783, 0.0600, 0.0509], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0003, 0.0003], device='cuda:1') 2023-05-19 02:27:34,783 INFO [finetune.py:992] (1/2) Epoch 20, batch 6050, loss[loss=0.1605, simple_loss=0.2513, pruned_loss=0.03488, over 12134.00 frames. ], tot_loss[loss=0.1608, simple_loss=0.2515, pruned_loss=0.03505, over 2375522.59 frames. ], batch size: 39, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:27:41,665 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.577e+02 2.670e+02 3.111e+02 3.696e+02 7.955e+02, threshold=6.222e+02, percent-clipped=2.0 2023-05-19 02:27:41,785 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=338508.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:27:48,637 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.6079, 2.6747, 3.6518, 4.5975, 3.9566, 4.7417, 4.0221, 3.6183], device='cuda:1'), covar=tensor([0.0038, 0.0398, 0.0191, 0.0056, 0.0144, 0.0071, 0.0139, 0.0316], device='cuda:1'), in_proj_covar=tensor([0.0093, 0.0126, 0.0106, 0.0084, 0.0108, 0.0121, 0.0106, 0.0140], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 02:28:02,710 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.57 vs. limit=2.0 2023-05-19 02:28:09,942 INFO [finetune.py:992] (1/2) Epoch 20, batch 6100, loss[loss=0.1753, simple_loss=0.2659, pruned_loss=0.04239, over 12346.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2517, pruned_loss=0.03548, over 2368840.35 frames. ], batch size: 36, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:28:15,469 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=338556.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:28:16,893 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=338558.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:28:21,908 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.7662, 2.6919, 3.8360, 4.7833, 4.1632, 4.8244, 4.2128, 3.6770], device='cuda:1'), covar=tensor([0.0038, 0.0438, 0.0142, 0.0036, 0.0103, 0.0073, 0.0123, 0.0315], device='cuda:1'), in_proj_covar=tensor([0.0094, 0.0126, 0.0107, 0.0085, 0.0109, 0.0122, 0.0107, 0.0141], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 02:28:26,062 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.3209, 2.3930, 3.0536, 4.1221, 2.1943, 4.2057, 4.2527, 4.2520], device='cuda:1'), covar=tensor([0.0168, 0.1481, 0.0579, 0.0203, 0.1555, 0.0310, 0.0224, 0.0157], device='cuda:1'), in_proj_covar=tensor([0.0129, 0.0211, 0.0189, 0.0130, 0.0193, 0.0189, 0.0188, 0.0131], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:1') 2023-05-19 02:28:29,618 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.8129, 2.8868, 5.3502, 2.4873, 2.6374, 4.0801, 3.0079, 3.9077], device='cuda:1'), covar=tensor([0.0406, 0.1604, 0.0220, 0.1384, 0.2159, 0.1328, 0.1714, 0.1131], device='cuda:1'), in_proj_covar=tensor([0.0245, 0.0244, 0.0268, 0.0191, 0.0242, 0.0300, 0.0231, 0.0276], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 02:28:41,308 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.6791, 2.8577, 4.3597, 4.4914, 2.7436, 2.5583, 2.8294, 2.1474], device='cuda:1'), covar=tensor([0.1829, 0.2995, 0.0551, 0.0493, 0.1470, 0.2723, 0.3052, 0.4185], device='cuda:1'), in_proj_covar=tensor([0.0318, 0.0403, 0.0289, 0.0316, 0.0289, 0.0331, 0.0415, 0.0389], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-19 02:28:44,426 INFO [finetune.py:992] (1/2) Epoch 20, batch 6150, loss[loss=0.1736, simple_loss=0.2618, pruned_loss=0.04266, over 12330.00 frames. ], tot_loss[loss=0.162, simple_loss=0.2523, pruned_loss=0.03582, over 2365181.28 frames. ], batch size: 31, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:28:44,609 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=338598.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:28:50,332 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=338606.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:28:51,691 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.699e+02 2.539e+02 2.985e+02 3.635e+02 5.861e+02, threshold=5.970e+02, percent-clipped=0.0 2023-05-19 02:29:19,832 INFO [finetune.py:992] (1/2) Epoch 20, batch 6200, loss[loss=0.2446, simple_loss=0.3106, pruned_loss=0.08926, over 7926.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2523, pruned_loss=0.03603, over 2365955.42 frames. ], batch size: 98, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:29:27,652 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=338659.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:29:34,559 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=338669.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:29:46,329 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.11 vs. limit=2.0 2023-05-19 02:29:51,658 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=338693.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:29:54,803 INFO [finetune.py:992] (1/2) Epoch 20, batch 6250, loss[loss=0.2766, simple_loss=0.3384, pruned_loss=0.1074, over 8219.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.2516, pruned_loss=0.03602, over 2366732.82 frames. ], batch size: 98, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:29:57,634 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=338702.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:30:00,505 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=338706.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:30:01,702 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.630e+02 2.964e+02 3.764e+02 6.429e+02, threshold=5.928e+02, percent-clipped=1.0 2023-05-19 02:30:17,065 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=338730.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:30:20,374 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.0934, 5.8542, 5.4298, 5.3576, 6.0346, 5.2154, 5.4185, 5.4564], device='cuda:1'), covar=tensor([0.1724, 0.1123, 0.1322, 0.2240, 0.0975, 0.2551, 0.2210, 0.1221], device='cuda:1'), in_proj_covar=tensor([0.0379, 0.0533, 0.0430, 0.0475, 0.0493, 0.0470, 0.0425, 0.0410], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 02:30:21,086 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=338736.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:30:25,308 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.7071, 2.9128, 4.3777, 4.5697, 2.8113, 2.5571, 2.8451, 2.1905], device='cuda:1'), covar=tensor([0.1754, 0.2795, 0.0543, 0.0446, 0.1433, 0.2704, 0.3072, 0.4106], device='cuda:1'), in_proj_covar=tensor([0.0320, 0.0406, 0.0291, 0.0318, 0.0291, 0.0334, 0.0418, 0.0391], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-19 02:30:29,019 INFO [finetune.py:992] (1/2) Epoch 20, batch 6300, loss[loss=0.1748, simple_loss=0.2594, pruned_loss=0.04513, over 12359.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2525, pruned_loss=0.03614, over 2369134.88 frames. ], batch size: 31, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:30:30,495 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=338750.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:30:33,514 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=338754.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:30:35,497 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.2754, 6.1108, 5.6285, 5.6789, 6.2129, 5.4784, 5.6667, 5.7061], device='cuda:1'), covar=tensor([0.1390, 0.0935, 0.1249, 0.1803, 0.0874, 0.2220, 0.2003, 0.1119], device='cuda:1'), in_proj_covar=tensor([0.0377, 0.0531, 0.0428, 0.0473, 0.0491, 0.0468, 0.0423, 0.0408], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 02:30:41,039 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=338765.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:30:42,561 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=338767.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:30:45,229 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.2211, 2.7156, 3.7683, 3.2219, 3.6299, 3.3553, 2.8835, 3.7597], device='cuda:1'), covar=tensor([0.0179, 0.0403, 0.0206, 0.0274, 0.0199, 0.0238, 0.0392, 0.0130], device='cuda:1'), in_proj_covar=tensor([0.0200, 0.0221, 0.0210, 0.0206, 0.0239, 0.0183, 0.0214, 0.0210], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 02:30:48,847 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.4806, 2.5160, 3.2249, 4.2791, 2.1558, 4.3903, 4.4389, 4.4173], device='cuda:1'), covar=tensor([0.0148, 0.1320, 0.0510, 0.0188, 0.1513, 0.0279, 0.0164, 0.0122], device='cuda:1'), in_proj_covar=tensor([0.0129, 0.0209, 0.0188, 0.0129, 0.0192, 0.0188, 0.0187, 0.0130], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:1') 2023-05-19 02:30:48,856 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=338775.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:30:51,701 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.23 vs. limit=5.0 2023-05-19 02:30:54,652 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=338784.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:31:00,121 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.5748, 5.4455, 5.5451, 5.5538, 5.1953, 5.2291, 5.0170, 5.4693], device='cuda:1'), covar=tensor([0.0739, 0.0547, 0.0779, 0.0632, 0.1804, 0.1323, 0.0569, 0.1074], device='cuda:1'), in_proj_covar=tensor([0.0582, 0.0759, 0.0664, 0.0674, 0.0913, 0.0792, 0.0606, 0.0516], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:1') 2023-05-19 02:31:04,116 INFO [finetune.py:992] (1/2) Epoch 20, batch 6350, loss[loss=0.2314, simple_loss=0.3095, pruned_loss=0.07661, over 7785.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.2527, pruned_loss=0.0363, over 2369322.96 frames. ], batch size: 99, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:31:11,388 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.938e+02 2.514e+02 2.979e+02 3.494e+02 1.002e+03, threshold=5.958e+02, percent-clipped=5.0 2023-05-19 02:31:13,092 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.8828, 3.1217, 4.7181, 4.9414, 2.9170, 2.7606, 3.1022, 2.3861], device='cuda:1'), covar=tensor([0.1718, 0.3152, 0.0488, 0.0416, 0.1488, 0.2646, 0.2977, 0.4118], device='cuda:1'), in_proj_covar=tensor([0.0320, 0.0405, 0.0290, 0.0317, 0.0290, 0.0333, 0.0417, 0.0391], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-19 02:31:31,458 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=338836.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 02:31:32,680 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.7687, 2.3575, 3.1442, 2.7788, 3.0970, 2.9677, 2.3803, 3.1703], device='cuda:1'), covar=tensor([0.0185, 0.0416, 0.0213, 0.0304, 0.0180, 0.0260, 0.0431, 0.0164], device='cuda:1'), in_proj_covar=tensor([0.0198, 0.0219, 0.0208, 0.0204, 0.0236, 0.0182, 0.0213, 0.0208], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 02:31:34,277 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=3.11 vs. limit=5.0 2023-05-19 02:31:37,880 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.44 vs. limit=2.0 2023-05-19 02:31:39,433 INFO [finetune.py:992] (1/2) Epoch 20, batch 6400, loss[loss=0.1703, simple_loss=0.2615, pruned_loss=0.0396, over 12285.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2524, pruned_loss=0.03608, over 2378868.91 frames. ], batch size: 37, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:31:45,381 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.20 vs. limit=2.0 2023-05-19 02:31:50,711 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=338864.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:32:14,108 INFO [finetune.py:992] (1/2) Epoch 20, batch 6450, loss[loss=0.1361, simple_loss=0.2192, pruned_loss=0.0265, over 11867.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2523, pruned_loss=0.0362, over 2369513.03 frames. ], batch size: 26, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:32:21,074 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.589e+02 2.941e+02 3.532e+02 8.176e+02, threshold=5.882e+02, percent-clipped=1.0 2023-05-19 02:32:24,762 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.6336, 3.4490, 5.1976, 2.7941, 2.9158, 3.8074, 3.2004, 3.8641], device='cuda:1'), covar=tensor([0.0514, 0.1109, 0.0302, 0.1122, 0.1918, 0.1470, 0.1452, 0.1252], device='cuda:1'), in_proj_covar=tensor([0.0246, 0.0243, 0.0268, 0.0191, 0.0242, 0.0300, 0.0231, 0.0275], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 02:32:33,627 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=338925.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:32:41,750 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.2541, 4.8376, 4.9277, 5.1018, 4.8850, 5.0927, 4.9492, 2.7905], device='cuda:1'), covar=tensor([0.0070, 0.0064, 0.0085, 0.0045, 0.0043, 0.0088, 0.0062, 0.0794], device='cuda:1'), in_proj_covar=tensor([0.0075, 0.0086, 0.0090, 0.0078, 0.0065, 0.0100, 0.0087, 0.0104], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 02:32:49,125 INFO [finetune.py:992] (1/2) Epoch 20, batch 6500, loss[loss=0.1603, simple_loss=0.2471, pruned_loss=0.03672, over 12116.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.253, pruned_loss=0.03602, over 2377581.36 frames. ], batch size: 38, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:32:53,232 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=338954.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:33:15,533 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.54 vs. limit=5.0 2023-05-19 02:33:24,144 INFO [finetune.py:992] (1/2) Epoch 20, batch 6550, loss[loss=0.1736, simple_loss=0.2705, pruned_loss=0.03834, over 12039.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2534, pruned_loss=0.03627, over 2369898.87 frames. ], batch size: 40, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:33:26,990 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.47 vs. limit=2.0 2023-05-19 02:33:31,536 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.701e+02 2.627e+02 3.179e+02 3.848e+02 7.308e+02, threshold=6.357e+02, percent-clipped=3.0 2023-05-19 02:33:43,526 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=339025.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:33:59,387 INFO [finetune.py:992] (1/2) Epoch 20, batch 6600, loss[loss=0.1306, simple_loss=0.2164, pruned_loss=0.02247, over 12174.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2523, pruned_loss=0.03591, over 2364292.31 frames. ], batch size: 29, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:34:00,885 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=339049.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:34:01,048 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.6339, 3.2937, 5.1446, 2.9739, 2.9006, 3.8471, 3.0878, 3.8311], device='cuda:1'), covar=tensor([0.0523, 0.1349, 0.0276, 0.1074, 0.2075, 0.1532, 0.1637, 0.1218], device='cuda:1'), in_proj_covar=tensor([0.0245, 0.0243, 0.0268, 0.0190, 0.0241, 0.0299, 0.0231, 0.0274], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 02:34:06,874 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.27 vs. limit=2.0 2023-05-19 02:34:10,135 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=339062.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:34:12,268 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=339065.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:34:14,418 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=339068.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:34:35,623 INFO [finetune.py:992] (1/2) Epoch 20, batch 6650, loss[loss=0.1489, simple_loss=0.2469, pruned_loss=0.02548, over 12104.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2529, pruned_loss=0.03612, over 2360920.60 frames. ], batch size: 33, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:34:37,277 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.3229, 4.7694, 4.0721, 5.0283, 4.5482, 2.9877, 4.0445, 3.0236], device='cuda:1'), covar=tensor([0.0900, 0.0745, 0.1437, 0.0487, 0.1134, 0.1729, 0.1209, 0.3561], device='cuda:1'), in_proj_covar=tensor([0.0320, 0.0387, 0.0372, 0.0352, 0.0383, 0.0284, 0.0357, 0.0376], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 02:34:42,439 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.712e+02 2.578e+02 2.982e+02 3.642e+02 1.129e+03, threshold=5.965e+02, percent-clipped=1.0 2023-05-19 02:34:43,389 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.0041, 3.5059, 5.3236, 2.8601, 3.0933, 3.8253, 3.4976, 3.7923], device='cuda:1'), covar=tensor([0.0366, 0.1174, 0.0310, 0.1183, 0.2014, 0.1743, 0.1295, 0.1503], device='cuda:1'), in_proj_covar=tensor([0.0246, 0.0243, 0.0269, 0.0191, 0.0241, 0.0301, 0.0232, 0.0275], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 02:34:45,835 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=339113.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:34:57,246 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=339129.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:34:58,574 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=339131.0, num_to_drop=1, layers_to_drop={2} 2023-05-19 02:35:10,320 INFO [finetune.py:992] (1/2) Epoch 20, batch 6700, loss[loss=0.1368, simple_loss=0.2149, pruned_loss=0.02937, over 11772.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.2518, pruned_loss=0.03589, over 2361411.03 frames. ], batch size: 26, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:35:11,189 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.2308, 5.0872, 5.1717, 5.2083, 4.8761, 4.9207, 4.6609, 5.1175], device='cuda:1'), covar=tensor([0.0662, 0.0611, 0.0867, 0.0612, 0.1940, 0.1435, 0.0601, 0.1076], device='cuda:1'), in_proj_covar=tensor([0.0583, 0.0761, 0.0667, 0.0674, 0.0913, 0.0792, 0.0610, 0.0517], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:1') 2023-05-19 02:35:45,460 INFO [finetune.py:992] (1/2) Epoch 20, batch 6750, loss[loss=0.1907, simple_loss=0.2863, pruned_loss=0.04754, over 8178.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2525, pruned_loss=0.03612, over 2360593.29 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:35:52,626 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.899e+02 2.574e+02 3.082e+02 3.669e+02 4.994e+02, threshold=6.164e+02, percent-clipped=0.0 2023-05-19 02:36:00,271 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-05-19 02:36:01,105 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=339220.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:36:12,284 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.7181, 2.7638, 4.4155, 4.5779, 2.8037, 2.5902, 2.9585, 2.2193], device='cuda:1'), covar=tensor([0.1878, 0.3533, 0.0511, 0.0496, 0.1495, 0.2842, 0.2942, 0.4335], device='cuda:1'), in_proj_covar=tensor([0.0319, 0.0404, 0.0289, 0.0317, 0.0289, 0.0332, 0.0416, 0.0390], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-19 02:36:21,027 INFO [finetune.py:992] (1/2) Epoch 20, batch 6800, loss[loss=0.1785, simple_loss=0.2757, pruned_loss=0.04061, over 12270.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2524, pruned_loss=0.03617, over 2365662.34 frames. ], batch size: 37, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:36:23,333 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.7107, 2.8222, 4.2159, 4.4124, 2.9479, 2.5886, 2.9064, 2.2468], device='cuda:1'), covar=tensor([0.1790, 0.2844, 0.0559, 0.0520, 0.1330, 0.2702, 0.2916, 0.4177], device='cuda:1'), in_proj_covar=tensor([0.0318, 0.0403, 0.0288, 0.0317, 0.0288, 0.0332, 0.0416, 0.0389], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-19 02:36:25,219 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=339254.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:36:35,243 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.32 vs. limit=2.0 2023-05-19 02:36:49,864 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.4954, 4.8887, 4.1514, 5.1027, 4.6977, 3.0872, 4.1343, 3.0603], device='cuda:1'), covar=tensor([0.0725, 0.0662, 0.1490, 0.0603, 0.1140, 0.1618, 0.1232, 0.3446], device='cuda:1'), in_proj_covar=tensor([0.0321, 0.0390, 0.0374, 0.0354, 0.0384, 0.0285, 0.0359, 0.0378], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 02:36:55,883 INFO [finetune.py:992] (1/2) Epoch 20, batch 6850, loss[loss=0.1504, simple_loss=0.2525, pruned_loss=0.02416, over 12195.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2521, pruned_loss=0.03601, over 2372150.91 frames. ], batch size: 35, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:36:56,031 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=339298.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:36:58,647 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=339302.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:37:02,621 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.587e+02 2.617e+02 2.954e+02 3.584e+02 7.644e+02, threshold=5.907e+02, percent-clipped=3.0 2023-05-19 02:37:14,276 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=339325.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:37:30,829 INFO [finetune.py:992] (1/2) Epoch 20, batch 6900, loss[loss=0.1232, simple_loss=0.2091, pruned_loss=0.01862, over 12384.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2523, pruned_loss=0.03616, over 2362710.72 frames. ], batch size: 30, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:37:31,693 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=339349.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:37:38,842 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=339359.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:37:40,813 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=339362.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:37:48,081 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=339373.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:38:00,896 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.91 vs. limit=2.0 2023-05-19 02:38:02,671 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=339393.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:38:05,371 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=339397.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:38:06,034 INFO [finetune.py:992] (1/2) Epoch 20, batch 6950, loss[loss=0.164, simple_loss=0.2578, pruned_loss=0.03507, over 12088.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2525, pruned_loss=0.03621, over 2356984.50 frames. ], batch size: 42, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:38:06,951 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.2464, 5.0290, 5.0558, 5.0643, 4.7428, 5.1772, 5.2321, 5.3888], device='cuda:1'), covar=tensor([0.0186, 0.0171, 0.0173, 0.0335, 0.0741, 0.0293, 0.0135, 0.0140], device='cuda:1'), in_proj_covar=tensor([0.0211, 0.0211, 0.0204, 0.0264, 0.0252, 0.0237, 0.0191, 0.0248], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:1') 2023-05-19 02:38:08,749 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.1433, 4.1027, 4.0600, 4.4662, 3.0640, 3.9390, 2.5727, 4.2443], device='cuda:1'), covar=tensor([0.1709, 0.0715, 0.0932, 0.0664, 0.1187, 0.0652, 0.1953, 0.0915], device='cuda:1'), in_proj_covar=tensor([0.0233, 0.0276, 0.0303, 0.0368, 0.0247, 0.0250, 0.0266, 0.0376], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-19 02:38:13,426 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.747e+02 2.407e+02 2.960e+02 3.546e+02 6.369e+02, threshold=5.921e+02, percent-clipped=2.0 2023-05-19 02:38:14,867 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=339410.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:38:24,449 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=339424.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:38:25,195 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.2131, 6.0076, 5.5856, 5.5344, 6.1051, 5.4093, 5.4559, 5.5369], device='cuda:1'), covar=tensor([0.1617, 0.0885, 0.1006, 0.2003, 0.0866, 0.2234, 0.2082, 0.1230], device='cuda:1'), in_proj_covar=tensor([0.0377, 0.0530, 0.0423, 0.0470, 0.0489, 0.0466, 0.0422, 0.0406], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 02:38:29,335 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=339431.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:38:41,175 INFO [finetune.py:992] (1/2) Epoch 20, batch 7000, loss[loss=0.1489, simple_loss=0.2385, pruned_loss=0.02971, over 12109.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.252, pruned_loss=0.03563, over 2368801.07 frames. ], batch size: 33, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:38:45,518 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=339454.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:39:02,521 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=339479.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:39:14,650 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.3598, 5.1944, 5.2804, 5.3258, 4.9758, 5.0085, 4.7686, 5.2459], device='cuda:1'), covar=tensor([0.0728, 0.0610, 0.0908, 0.0614, 0.1924, 0.1524, 0.0590, 0.1111], device='cuda:1'), in_proj_covar=tensor([0.0586, 0.0765, 0.0670, 0.0679, 0.0917, 0.0797, 0.0610, 0.0520], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:1') 2023-05-19 02:39:15,914 INFO [finetune.py:992] (1/2) Epoch 20, batch 7050, loss[loss=0.138, simple_loss=0.229, pruned_loss=0.02348, over 12029.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.252, pruned_loss=0.03564, over 2366584.94 frames. ], batch size: 31, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:39:23,067 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.934e+02 2.571e+02 3.008e+02 3.605e+02 9.495e+02, threshold=6.016e+02, percent-clipped=3.0 2023-05-19 02:39:31,480 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=339520.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:39:32,530 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=3.36 vs. limit=5.0 2023-05-19 02:39:42,827 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.17 vs. limit=2.0 2023-05-19 02:39:44,044 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=339537.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:39:51,304 INFO [finetune.py:992] (1/2) Epoch 20, batch 7100, loss[loss=0.1403, simple_loss=0.2187, pruned_loss=0.03094, over 12289.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2526, pruned_loss=0.03597, over 2372341.36 frames. ], batch size: 28, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:40:05,229 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=339568.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:40:26,037 INFO [finetune.py:992] (1/2) Epoch 20, batch 7150, loss[loss=0.1558, simple_loss=0.2573, pruned_loss=0.02711, over 12282.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2526, pruned_loss=0.036, over 2372003.17 frames. ], batch size: 37, lr: 3.08e-03, grad_scale: 8.0 2023-05-19 02:40:26,211 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=339598.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:40:33,267 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.660e+02 2.593e+02 2.905e+02 3.471e+02 7.466e+02, threshold=5.810e+02, percent-clipped=2.0 2023-05-19 02:41:01,929 INFO [finetune.py:992] (1/2) Epoch 20, batch 7200, loss[loss=0.1676, simple_loss=0.2615, pruned_loss=0.03681, over 12145.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2524, pruned_loss=0.03591, over 2373382.69 frames. ], batch size: 36, lr: 3.08e-03, grad_scale: 16.0 2023-05-19 02:41:06,141 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=339654.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:41:20,984 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.2131, 3.6936, 3.7279, 4.1128, 2.8135, 3.6415, 2.5364, 3.6633], device='cuda:1'), covar=tensor([0.1632, 0.0802, 0.0960, 0.0746, 0.1275, 0.0694, 0.1900, 0.0978], device='cuda:1'), in_proj_covar=tensor([0.0234, 0.0277, 0.0305, 0.0369, 0.0248, 0.0251, 0.0267, 0.0379], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-19 02:41:37,342 INFO [finetune.py:992] (1/2) Epoch 20, batch 7250, loss[loss=0.1648, simple_loss=0.2579, pruned_loss=0.03588, over 12171.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.2514, pruned_loss=0.03538, over 2379855.72 frames. ], batch size: 36, lr: 3.08e-03, grad_scale: 16.0 2023-05-19 02:41:44,196 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.880e+02 2.590e+02 2.948e+02 3.480e+02 6.149e+02, threshold=5.896e+02, percent-clipped=1.0 2023-05-19 02:41:48,336 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=339714.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 02:41:55,109 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=339724.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:42:05,977 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.32 vs. limit=2.0 2023-05-19 02:42:11,780 INFO [finetune.py:992] (1/2) Epoch 20, batch 7300, loss[loss=0.165, simple_loss=0.2632, pruned_loss=0.03339, over 12298.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.2513, pruned_loss=0.03544, over 2384007.81 frames. ], batch size: 34, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:42:12,540 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=339749.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:42:14,779 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=339752.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 02:42:28,600 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=339772.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:42:30,767 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=339775.0, num_to_drop=1, layers_to_drop={2} 2023-05-19 02:42:46,930 INFO [finetune.py:992] (1/2) Epoch 20, batch 7350, loss[loss=0.1512, simple_loss=0.2525, pruned_loss=0.025, over 10798.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2532, pruned_loss=0.03596, over 2375796.33 frames. ], batch size: 68, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:42:54,210 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.075e+02 2.704e+02 3.207e+02 3.847e+02 6.086e+02, threshold=6.413e+02, percent-clipped=1.0 2023-05-19 02:42:57,726 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=339813.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 02:43:22,082 INFO [finetune.py:992] (1/2) Epoch 20, batch 7400, loss[loss=0.1656, simple_loss=0.2556, pruned_loss=0.03784, over 12357.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2527, pruned_loss=0.03609, over 2378113.56 frames. ], batch size: 38, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:43:53,167 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=339893.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:43:56,582 INFO [finetune.py:992] (1/2) Epoch 20, batch 7450, loss[loss=0.1347, simple_loss=0.2235, pruned_loss=0.02291, over 12237.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2525, pruned_loss=0.03596, over 2377234.82 frames. ], batch size: 32, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:44:03,617 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.916e+02 2.803e+02 3.200e+02 3.818e+02 1.775e+03, threshold=6.399e+02, percent-clipped=4.0 2023-05-19 02:44:32,255 INFO [finetune.py:992] (1/2) Epoch 20, batch 7500, loss[loss=0.1704, simple_loss=0.2663, pruned_loss=0.03727, over 12195.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2517, pruned_loss=0.03554, over 2378078.27 frames. ], batch size: 35, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:44:36,167 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.38 vs. limit=2.0 2023-05-19 02:44:36,522 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=339954.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:44:40,888 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.81 vs. limit=5.0 2023-05-19 02:44:43,250 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.2089, 5.0283, 5.0310, 5.0358, 4.7462, 5.1082, 5.1935, 5.3221], device='cuda:1'), covar=tensor([0.0219, 0.0177, 0.0179, 0.0354, 0.0698, 0.0293, 0.0138, 0.0166], device='cuda:1'), in_proj_covar=tensor([0.0212, 0.0211, 0.0205, 0.0264, 0.0253, 0.0238, 0.0192, 0.0250], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0005, 0.0003, 0.0005], device='cuda:1') 2023-05-19 02:45:07,511 INFO [finetune.py:992] (1/2) Epoch 20, batch 7550, loss[loss=0.152, simple_loss=0.2429, pruned_loss=0.03056, over 12158.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2524, pruned_loss=0.0359, over 2374062.62 frames. ], batch size: 34, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:45:13,332 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=340002.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:45:16,312 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.6747, 3.6794, 3.2626, 3.0493, 2.8920, 2.7006, 3.6696, 2.3903], device='cuda:1'), covar=tensor([0.0422, 0.0172, 0.0235, 0.0290, 0.0436, 0.0385, 0.0140, 0.0566], device='cuda:1'), in_proj_covar=tensor([0.0202, 0.0172, 0.0178, 0.0202, 0.0210, 0.0207, 0.0183, 0.0214], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 02:45:17,377 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.907e+02 2.442e+02 2.906e+02 3.531e+02 6.907e+02, threshold=5.812e+02, percent-clipped=1.0 2023-05-19 02:45:45,074 INFO [finetune.py:992] (1/2) Epoch 20, batch 7600, loss[loss=0.1416, simple_loss=0.239, pruned_loss=0.02204, over 12279.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.2521, pruned_loss=0.03571, over 2381008.65 frames. ], batch size: 37, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:45:45,912 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=340049.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:46:00,993 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=340070.0, num_to_drop=1, layers_to_drop={3} 2023-05-19 02:46:19,704 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=340097.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:46:20,268 INFO [finetune.py:992] (1/2) Epoch 20, batch 7650, loss[loss=0.1687, simple_loss=0.2619, pruned_loss=0.03776, over 12137.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2525, pruned_loss=0.03629, over 2371202.39 frames. ], batch size: 34, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:46:27,354 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.5587, 2.8720, 3.6823, 4.5423, 4.0279, 4.6746, 4.0859, 3.4105], device='cuda:1'), covar=tensor([0.0062, 0.0397, 0.0162, 0.0070, 0.0117, 0.0088, 0.0131, 0.0348], device='cuda:1'), in_proj_covar=tensor([0.0095, 0.0128, 0.0109, 0.0086, 0.0111, 0.0123, 0.0108, 0.0143], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 02:46:27,810 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-05-19 02:46:27,853 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.934e+02 2.675e+02 2.965e+02 3.568e+02 6.455e+02, threshold=5.931e+02, percent-clipped=2.0 2023-05-19 02:46:27,929 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=340108.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 02:46:33,617 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.9387, 4.7664, 4.7696, 4.7989, 4.4654, 4.8618, 4.8901, 5.0490], device='cuda:1'), covar=tensor([0.0302, 0.0180, 0.0206, 0.0369, 0.0756, 0.0369, 0.0171, 0.0223], device='cuda:1'), in_proj_covar=tensor([0.0211, 0.0210, 0.0204, 0.0263, 0.0251, 0.0237, 0.0191, 0.0249], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:1') 2023-05-19 02:46:53,607 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.95 vs. limit=2.0 2023-05-19 02:46:55,328 INFO [finetune.py:992] (1/2) Epoch 20, batch 7700, loss[loss=0.1903, simple_loss=0.2703, pruned_loss=0.05516, over 8118.00 frames. ], tot_loss[loss=0.1634, simple_loss=0.2533, pruned_loss=0.03674, over 2359581.01 frames. ], batch size: 97, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:47:26,580 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=340193.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:47:29,959 INFO [finetune.py:992] (1/2) Epoch 20, batch 7750, loss[loss=0.1518, simple_loss=0.2287, pruned_loss=0.03742, over 12352.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.2527, pruned_loss=0.03648, over 2368455.97 frames. ], batch size: 30, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:47:37,893 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.720e+02 2.635e+02 3.142e+02 3.673e+02 6.224e+02, threshold=6.284e+02, percent-clipped=2.0 2023-05-19 02:48:00,641 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=340241.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:48:05,993 INFO [finetune.py:992] (1/2) Epoch 20, batch 7800, loss[loss=0.1595, simple_loss=0.2494, pruned_loss=0.03479, over 12108.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2528, pruned_loss=0.03656, over 2365459.16 frames. ], batch size: 33, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:48:38,894 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.40 vs. limit=2.0 2023-05-19 02:48:40,422 INFO [finetune.py:992] (1/2) Epoch 20, batch 7850, loss[loss=0.1582, simple_loss=0.2557, pruned_loss=0.03033, over 11637.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2524, pruned_loss=0.03646, over 2367610.42 frames. ], batch size: 48, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:48:47,479 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.642e+02 2.941e+02 3.441e+02 7.528e+02, threshold=5.881e+02, percent-clipped=3.0 2023-05-19 02:49:15,085 INFO [finetune.py:992] (1/2) Epoch 20, batch 7900, loss[loss=0.1922, simple_loss=0.2899, pruned_loss=0.04724, over 12131.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.2531, pruned_loss=0.03663, over 2361926.52 frames. ], batch size: 39, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:49:31,144 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=340370.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 02:49:51,032 INFO [finetune.py:992] (1/2) Epoch 20, batch 7950, loss[loss=0.1786, simple_loss=0.2658, pruned_loss=0.04573, over 12264.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.2524, pruned_loss=0.03664, over 2354875.23 frames. ], batch size: 37, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:49:58,269 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.854e+02 2.568e+02 3.037e+02 3.883e+02 7.199e+02, threshold=6.075e+02, percent-clipped=3.0 2023-05-19 02:49:58,431 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=340408.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 02:50:05,389 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=340418.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 02:50:25,832 INFO [finetune.py:992] (1/2) Epoch 20, batch 8000, loss[loss=0.1699, simple_loss=0.2596, pruned_loss=0.0401, over 12125.00 frames. ], tot_loss[loss=0.163, simple_loss=0.253, pruned_loss=0.03649, over 2363146.75 frames. ], batch size: 39, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:50:28,945 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.1453, 3.8979, 3.9934, 4.3168, 2.8477, 3.7914, 2.4692, 4.0250], device='cuda:1'), covar=tensor([0.1667, 0.0826, 0.0979, 0.0679, 0.1228, 0.0698, 0.2038, 0.1056], device='cuda:1'), in_proj_covar=tensor([0.0234, 0.0278, 0.0304, 0.0370, 0.0248, 0.0251, 0.0268, 0.0379], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-19 02:50:31,585 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=340456.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 02:50:48,638 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.5112, 2.7638, 3.2601, 4.2447, 2.2847, 4.3700, 4.5047, 4.4970], device='cuda:1'), covar=tensor([0.0134, 0.1243, 0.0540, 0.0188, 0.1522, 0.0316, 0.0160, 0.0115], device='cuda:1'), in_proj_covar=tensor([0.0129, 0.0209, 0.0189, 0.0130, 0.0192, 0.0189, 0.0188, 0.0130], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:1') 2023-05-19 02:51:00,145 INFO [finetune.py:992] (1/2) Epoch 20, batch 8050, loss[loss=0.152, simple_loss=0.239, pruned_loss=0.03254, over 12110.00 frames. ], tot_loss[loss=0.1635, simple_loss=0.2531, pruned_loss=0.03691, over 2355938.54 frames. ], batch size: 33, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:51:07,538 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.988e+02 2.771e+02 3.199e+02 3.982e+02 8.262e+02, threshold=6.397e+02, percent-clipped=3.0 2023-05-19 02:51:19,474 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=340525.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:51:27,105 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.5545, 5.1418, 5.5563, 4.8750, 5.2155, 4.9195, 5.5837, 5.2070], device='cuda:1'), covar=tensor([0.0272, 0.0370, 0.0255, 0.0252, 0.0410, 0.0322, 0.0186, 0.0257], device='cuda:1'), in_proj_covar=tensor([0.0293, 0.0291, 0.0320, 0.0287, 0.0287, 0.0288, 0.0266, 0.0237], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 02:51:28,969 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.8036, 5.5100, 5.0000, 4.9658, 5.6146, 4.9188, 5.0557, 4.9984], device='cuda:1'), covar=tensor([0.1307, 0.1028, 0.1204, 0.1934, 0.0899, 0.2117, 0.1945, 0.1274], device='cuda:1'), in_proj_covar=tensor([0.0376, 0.0529, 0.0423, 0.0469, 0.0487, 0.0465, 0.0423, 0.0406], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 02:51:35,845 INFO [finetune.py:992] (1/2) Epoch 20, batch 8100, loss[loss=0.1539, simple_loss=0.2474, pruned_loss=0.03021, over 12130.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2524, pruned_loss=0.03642, over 2362231.45 frames. ], batch size: 33, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:52:02,199 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=340586.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:52:10,370 INFO [finetune.py:992] (1/2) Epoch 20, batch 8150, loss[loss=0.1783, simple_loss=0.2718, pruned_loss=0.04244, over 11568.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.252, pruned_loss=0.03623, over 2368843.40 frames. ], batch size: 48, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:52:10,565 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.0879, 4.7085, 4.7344, 5.0215, 4.7397, 4.9787, 4.9061, 2.5540], device='cuda:1'), covar=tensor([0.0096, 0.0078, 0.0110, 0.0056, 0.0062, 0.0106, 0.0079, 0.1040], device='cuda:1'), in_proj_covar=tensor([0.0075, 0.0086, 0.0090, 0.0078, 0.0065, 0.0100, 0.0087, 0.0104], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 02:52:17,438 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.856e+02 2.602e+02 3.226e+02 3.872e+02 5.642e+02, threshold=6.452e+02, percent-clipped=0.0 2023-05-19 02:52:45,738 INFO [finetune.py:992] (1/2) Epoch 20, batch 8200, loss[loss=0.1631, simple_loss=0.2516, pruned_loss=0.03733, over 12281.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.253, pruned_loss=0.03659, over 2364828.58 frames. ], batch size: 33, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:52:54,327 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.7422, 2.4122, 3.1252, 3.6154, 3.3919, 3.7809, 3.4467, 2.6078], device='cuda:1'), covar=tensor([0.0070, 0.0413, 0.0206, 0.0084, 0.0168, 0.0096, 0.0152, 0.0417], device='cuda:1'), in_proj_covar=tensor([0.0095, 0.0126, 0.0108, 0.0086, 0.0110, 0.0122, 0.0107, 0.0141], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 02:53:14,386 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.7934, 4.2647, 3.8596, 4.7692, 4.2063, 2.8347, 3.9474, 2.9057], device='cuda:1'), covar=tensor([0.1127, 0.1015, 0.1526, 0.0503, 0.1334, 0.1852, 0.1247, 0.3456], device='cuda:1'), in_proj_covar=tensor([0.0322, 0.0391, 0.0373, 0.0355, 0.0381, 0.0284, 0.0360, 0.0375], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 02:53:20,912 INFO [finetune.py:992] (1/2) Epoch 20, batch 8250, loss[loss=0.1401, simple_loss=0.2278, pruned_loss=0.02621, over 12369.00 frames. ], tot_loss[loss=0.1635, simple_loss=0.2535, pruned_loss=0.03673, over 2359038.26 frames. ], batch size: 30, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:53:27,648 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.809e+02 2.637e+02 3.079e+02 3.565e+02 9.552e+02, threshold=6.158e+02, percent-clipped=2.0 2023-05-19 02:53:35,294 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.37 vs. limit=2.0 2023-05-19 02:53:55,556 INFO [finetune.py:992] (1/2) Epoch 20, batch 8300, loss[loss=0.1768, simple_loss=0.2698, pruned_loss=0.04188, over 12027.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.2531, pruned_loss=0.03659, over 2363155.43 frames. ], batch size: 40, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:54:30,933 INFO [finetune.py:992] (1/2) Epoch 20, batch 8350, loss[loss=0.1729, simple_loss=0.2623, pruned_loss=0.04175, over 12179.00 frames. ], tot_loss[loss=0.1638, simple_loss=0.2536, pruned_loss=0.037, over 2357225.34 frames. ], batch size: 35, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:54:38,240 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.799e+02 2.533e+02 3.136e+02 3.721e+02 8.819e+02, threshold=6.271e+02, percent-clipped=3.0 2023-05-19 02:54:53,636 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=2.24 vs. limit=5.0 2023-05-19 02:55:06,368 INFO [finetune.py:992] (1/2) Epoch 20, batch 8400, loss[loss=0.1531, simple_loss=0.2516, pruned_loss=0.0273, over 12277.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.2533, pruned_loss=0.03655, over 2370930.09 frames. ], batch size: 32, lr: 3.07e-03, grad_scale: 16.0 2023-05-19 02:55:06,832 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-05-19 02:55:29,456 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=340881.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:55:38,578 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.5229, 2.5847, 3.5879, 4.5098, 3.8074, 4.5205, 3.9961, 3.2059], device='cuda:1'), covar=tensor([0.0056, 0.0442, 0.0197, 0.0051, 0.0161, 0.0082, 0.0148, 0.0405], device='cuda:1'), in_proj_covar=tensor([0.0095, 0.0127, 0.0108, 0.0086, 0.0109, 0.0122, 0.0107, 0.0141], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 02:55:41,230 INFO [finetune.py:992] (1/2) Epoch 20, batch 8450, loss[loss=0.1634, simple_loss=0.2529, pruned_loss=0.03695, over 12187.00 frames. ], tot_loss[loss=0.162, simple_loss=0.2525, pruned_loss=0.03577, over 2373951.27 frames. ], batch size: 35, lr: 3.07e-03, grad_scale: 8.0 2023-05-19 02:55:46,126 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.6487, 4.6285, 4.4718, 4.0751, 4.2538, 4.5706, 4.3263, 4.1674], device='cuda:1'), covar=tensor([0.0900, 0.1032, 0.0820, 0.1746, 0.1817, 0.1024, 0.1595, 0.1129], device='cuda:1'), in_proj_covar=tensor([0.0668, 0.0604, 0.0561, 0.0683, 0.0458, 0.0788, 0.0827, 0.0596], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0004, 0.0004, 0.0002], device='cuda:1') 2023-05-19 02:55:48,753 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.776e+02 2.491e+02 2.887e+02 3.519e+02 8.835e+02, threshold=5.775e+02, percent-clipped=2.0 2023-05-19 02:56:16,150 INFO [finetune.py:992] (1/2) Epoch 20, batch 8500, loss[loss=0.1688, simple_loss=0.247, pruned_loss=0.0453, over 12101.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2532, pruned_loss=0.03627, over 2363638.92 frames. ], batch size: 32, lr: 3.07e-03, grad_scale: 8.0 2023-05-19 02:56:26,022 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.5341, 4.9215, 3.4158, 3.0636, 4.2380, 2.9404, 4.1010, 3.6116], device='cuda:1'), covar=tensor([0.0705, 0.0684, 0.1012, 0.1389, 0.0299, 0.1258, 0.0541, 0.0759], device='cuda:1'), in_proj_covar=tensor([0.0190, 0.0264, 0.0179, 0.0203, 0.0147, 0.0185, 0.0203, 0.0178], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 02:56:51,254 INFO [finetune.py:992] (1/2) Epoch 20, batch 8550, loss[loss=0.1496, simple_loss=0.2295, pruned_loss=0.03488, over 11851.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2529, pruned_loss=0.03621, over 2360767.59 frames. ], batch size: 26, lr: 3.07e-03, grad_scale: 8.0 2023-05-19 02:56:59,233 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.006e+02 2.772e+02 3.141e+02 3.674e+02 1.798e+03, threshold=6.281e+02, percent-clipped=4.0 2023-05-19 02:57:12,857 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.80 vs. limit=2.0 2023-05-19 02:57:26,121 INFO [finetune.py:992] (1/2) Epoch 20, batch 8600, loss[loss=0.1743, simple_loss=0.2609, pruned_loss=0.04387, over 11179.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2522, pruned_loss=0.03607, over 2359468.45 frames. ], batch size: 55, lr: 3.07e-03, grad_scale: 8.0 2023-05-19 02:57:31,836 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.4484, 4.9365, 5.4036, 4.7700, 5.0363, 4.8077, 5.4592, 5.0695], device='cuda:1'), covar=tensor([0.0304, 0.0471, 0.0298, 0.0277, 0.0471, 0.0378, 0.0214, 0.0320], device='cuda:1'), in_proj_covar=tensor([0.0291, 0.0290, 0.0318, 0.0285, 0.0286, 0.0287, 0.0264, 0.0235], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 02:57:39,562 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.6849, 2.7855, 4.4878, 4.7313, 2.8677, 2.5577, 2.9613, 2.2145], device='cuda:1'), covar=tensor([0.1767, 0.3079, 0.0510, 0.0435, 0.1411, 0.2818, 0.2941, 0.4276], device='cuda:1'), in_proj_covar=tensor([0.0322, 0.0405, 0.0290, 0.0319, 0.0291, 0.0335, 0.0418, 0.0391], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-19 02:57:44,230 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.4149, 4.7212, 3.2023, 2.6861, 4.2011, 2.4266, 4.0964, 3.2781], device='cuda:1'), covar=tensor([0.0695, 0.0586, 0.1069, 0.1799, 0.0277, 0.1812, 0.0483, 0.1024], device='cuda:1'), in_proj_covar=tensor([0.0191, 0.0264, 0.0180, 0.0202, 0.0147, 0.0185, 0.0203, 0.0178], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 02:58:01,386 INFO [finetune.py:992] (1/2) Epoch 20, batch 8650, loss[loss=0.1592, simple_loss=0.2576, pruned_loss=0.03047, over 12278.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.2519, pruned_loss=0.03585, over 2359653.59 frames. ], batch size: 37, lr: 3.07e-03, grad_scale: 8.0 2023-05-19 02:58:09,054 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.034e+02 2.571e+02 3.077e+02 3.507e+02 6.175e+02, threshold=6.155e+02, percent-clipped=0.0 2023-05-19 02:58:36,611 INFO [finetune.py:992] (1/2) Epoch 20, batch 8700, loss[loss=0.1469, simple_loss=0.2271, pruned_loss=0.03335, over 12244.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.2527, pruned_loss=0.03597, over 2370215.44 frames. ], batch size: 28, lr: 3.07e-03, grad_scale: 8.0 2023-05-19 02:58:59,533 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=341181.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:59:11,283 INFO [finetune.py:992] (1/2) Epoch 20, batch 8750, loss[loss=0.1657, simple_loss=0.258, pruned_loss=0.03664, over 12023.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2529, pruned_loss=0.03574, over 2372609.45 frames. ], batch size: 40, lr: 3.07e-03, grad_scale: 8.0 2023-05-19 02:59:13,885 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.6811, 2.8241, 4.4705, 4.6165, 2.8260, 2.5927, 2.9705, 2.1710], device='cuda:1'), covar=tensor([0.1824, 0.3073, 0.0497, 0.0461, 0.1452, 0.2647, 0.2890, 0.4189], device='cuda:1'), in_proj_covar=tensor([0.0321, 0.0403, 0.0290, 0.0318, 0.0291, 0.0334, 0.0416, 0.0390], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-19 02:59:14,512 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.2458, 4.6772, 3.0842, 2.7584, 3.9507, 2.7065, 3.9576, 3.2914], device='cuda:1'), covar=tensor([0.0813, 0.0575, 0.1196, 0.1571, 0.0377, 0.1354, 0.0493, 0.0859], device='cuda:1'), in_proj_covar=tensor([0.0193, 0.0267, 0.0182, 0.0204, 0.0149, 0.0187, 0.0205, 0.0180], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 02:59:19,077 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.539e+02 3.018e+02 3.610e+02 6.559e+02, threshold=6.037e+02, percent-clipped=4.0 2023-05-19 02:59:25,809 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.94 vs. limit=2.0 2023-05-19 02:59:33,508 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=341229.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 02:59:46,909 INFO [finetune.py:992] (1/2) Epoch 20, batch 8800, loss[loss=0.1785, simple_loss=0.2685, pruned_loss=0.04419, over 12132.00 frames. ], tot_loss[loss=0.1613, simple_loss=0.2519, pruned_loss=0.03535, over 2369029.36 frames. ], batch size: 39, lr: 3.07e-03, grad_scale: 8.0 2023-05-19 03:00:08,234 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.6849, 2.8372, 3.3713, 4.4686, 2.4015, 4.5445, 4.6443, 4.6737], device='cuda:1'), covar=tensor([0.0141, 0.1285, 0.0495, 0.0189, 0.1561, 0.0259, 0.0192, 0.0106], device='cuda:1'), in_proj_covar=tensor([0.0130, 0.0209, 0.0187, 0.0129, 0.0191, 0.0186, 0.0187, 0.0130], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 03:00:21,937 INFO [finetune.py:992] (1/2) Epoch 20, batch 8850, loss[loss=0.1464, simple_loss=0.2353, pruned_loss=0.02881, over 12303.00 frames. ], tot_loss[loss=0.1608, simple_loss=0.2515, pruned_loss=0.03501, over 2379582.27 frames. ], batch size: 33, lr: 3.07e-03, grad_scale: 8.0 2023-05-19 03:00:29,143 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.6663, 2.8103, 4.3636, 4.5134, 2.7477, 2.5445, 2.8770, 2.1675], device='cuda:1'), covar=tensor([0.1911, 0.3228, 0.0560, 0.0502, 0.1544, 0.2830, 0.2997, 0.4196], device='cuda:1'), in_proj_covar=tensor([0.0323, 0.0406, 0.0291, 0.0320, 0.0292, 0.0336, 0.0418, 0.0392], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-19 03:00:29,539 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.883e+02 2.712e+02 3.085e+02 3.586e+02 5.651e+02, threshold=6.171e+02, percent-clipped=0.0 2023-05-19 03:00:31,220 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=2.22 vs. limit=5.0 2023-05-19 03:00:56,632 INFO [finetune.py:992] (1/2) Epoch 20, batch 8900, loss[loss=0.1738, simple_loss=0.2726, pruned_loss=0.03746, over 11170.00 frames. ], tot_loss[loss=0.1603, simple_loss=0.2509, pruned_loss=0.0349, over 2385839.51 frames. ], batch size: 55, lr: 3.07e-03, grad_scale: 8.0 2023-05-19 03:01:11,580 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.95 vs. limit=5.0 2023-05-19 03:01:31,959 INFO [finetune.py:992] (1/2) Epoch 20, batch 8950, loss[loss=0.1252, simple_loss=0.209, pruned_loss=0.02073, over 11812.00 frames. ], tot_loss[loss=0.1602, simple_loss=0.2505, pruned_loss=0.03492, over 2387359.18 frames. ], batch size: 26, lr: 3.07e-03, grad_scale: 8.0 2023-05-19 03:01:39,782 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.868e+02 2.654e+02 3.068e+02 3.706e+02 5.629e+02, threshold=6.135e+02, percent-clipped=0.0 2023-05-19 03:01:55,077 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.4842, 2.4611, 3.5728, 4.4059, 3.8739, 4.4660, 3.9416, 3.0171], device='cuda:1'), covar=tensor([0.0051, 0.0441, 0.0158, 0.0059, 0.0143, 0.0083, 0.0136, 0.0398], device='cuda:1'), in_proj_covar=tensor([0.0095, 0.0125, 0.0108, 0.0086, 0.0109, 0.0121, 0.0107, 0.0140], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 03:02:07,759 INFO [finetune.py:992] (1/2) Epoch 20, batch 9000, loss[loss=0.1381, simple_loss=0.2256, pruned_loss=0.02533, over 12188.00 frames. ], tot_loss[loss=0.1603, simple_loss=0.2505, pruned_loss=0.03504, over 2384029.88 frames. ], batch size: 29, lr: 3.07e-03, grad_scale: 8.0 2023-05-19 03:02:07,759 INFO [finetune.py:1017] (1/2) Computing validation loss 2023-05-19 03:02:12,440 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.7220, 3.0135, 4.3311, 4.4700, 3.0113, 2.6306, 2.9878, 2.1577], device='cuda:1'), covar=tensor([0.1834, 0.3037, 0.0515, 0.0475, 0.1422, 0.2852, 0.3039, 0.4673], device='cuda:1'), in_proj_covar=tensor([0.0322, 0.0405, 0.0290, 0.0318, 0.0291, 0.0334, 0.0417, 0.0391], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-19 03:02:25,417 INFO [finetune.py:1026] (1/2) Epoch 20, validation: loss=0.3184, simple_loss=0.392, pruned_loss=0.1224, over 1020973.00 frames. 2023-05-19 03:02:25,418 INFO [finetune.py:1027] (1/2) Maximum memory allocated so far is 12856MB 2023-05-19 03:02:41,082 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.68 vs. limit=2.0 2023-05-19 03:03:00,708 INFO [finetune.py:992] (1/2) Epoch 20, batch 9050, loss[loss=0.1412, simple_loss=0.2242, pruned_loss=0.02905, over 12340.00 frames. ], tot_loss[loss=0.16, simple_loss=0.2503, pruned_loss=0.03486, over 2382315.74 frames. ], batch size: 31, lr: 3.07e-03, grad_scale: 8.0 2023-05-19 03:03:03,021 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.5145, 2.7114, 3.2500, 4.3970, 2.3306, 4.4163, 4.4878, 4.5671], device='cuda:1'), covar=tensor([0.0137, 0.1301, 0.0533, 0.0172, 0.1477, 0.0268, 0.0205, 0.0105], device='cuda:1'), in_proj_covar=tensor([0.0130, 0.0210, 0.0188, 0.0130, 0.0191, 0.0187, 0.0188, 0.0130], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:1') 2023-05-19 03:03:08,225 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.99 vs. limit=2.0 2023-05-19 03:03:08,446 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.553e+02 3.007e+02 3.703e+02 8.660e+02, threshold=6.014e+02, percent-clipped=2.0 2023-05-19 03:03:28,451 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.3708, 6.1265, 5.6954, 5.6479, 6.2236, 5.4625, 5.6143, 5.6299], device='cuda:1'), covar=tensor([0.1435, 0.0921, 0.0909, 0.1860, 0.0835, 0.1998, 0.1972, 0.1123], device='cuda:1'), in_proj_covar=tensor([0.0382, 0.0535, 0.0426, 0.0477, 0.0491, 0.0471, 0.0426, 0.0411], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 03:03:31,320 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.3431, 4.7496, 2.9576, 2.6839, 4.0534, 2.5919, 3.9844, 3.1993], device='cuda:1'), covar=tensor([0.0755, 0.0431, 0.1250, 0.1588, 0.0303, 0.1431, 0.0495, 0.0891], device='cuda:1'), in_proj_covar=tensor([0.0191, 0.0263, 0.0179, 0.0203, 0.0147, 0.0185, 0.0202, 0.0178], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 03:03:32,149 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=2.97 vs. limit=5.0 2023-05-19 03:03:36,080 INFO [finetune.py:992] (1/2) Epoch 20, batch 9100, loss[loss=0.1554, simple_loss=0.2523, pruned_loss=0.02929, over 12324.00 frames. ], tot_loss[loss=0.1603, simple_loss=0.2505, pruned_loss=0.035, over 2382093.30 frames. ], batch size: 34, lr: 3.07e-03, grad_scale: 8.0 2023-05-19 03:03:44,581 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.1481, 6.0350, 5.6201, 5.4940, 6.1372, 5.2989, 5.5440, 5.5134], device='cuda:1'), covar=tensor([0.1438, 0.0906, 0.0949, 0.1830, 0.0791, 0.2319, 0.1689, 0.1115], device='cuda:1'), in_proj_covar=tensor([0.0382, 0.0535, 0.0427, 0.0477, 0.0491, 0.0471, 0.0426, 0.0411], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 03:04:10,625 INFO [finetune.py:992] (1/2) Epoch 20, batch 9150, loss[loss=0.1798, simple_loss=0.2657, pruned_loss=0.04694, over 12150.00 frames. ], tot_loss[loss=0.1606, simple_loss=0.2507, pruned_loss=0.0352, over 2388170.26 frames. ], batch size: 34, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:04:18,556 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.737e+02 2.573e+02 3.080e+02 3.663e+02 9.571e+02, threshold=6.160e+02, percent-clipped=3.0 2023-05-19 03:04:36,163 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.8012, 3.0261, 4.7895, 4.8913, 2.7958, 2.7160, 3.0884, 2.3847], device='cuda:1'), covar=tensor([0.1825, 0.2999, 0.0441, 0.0449, 0.1514, 0.2833, 0.3013, 0.4111], device='cuda:1'), in_proj_covar=tensor([0.0322, 0.0405, 0.0290, 0.0319, 0.0292, 0.0335, 0.0417, 0.0391], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-19 03:04:46,445 INFO [finetune.py:992] (1/2) Epoch 20, batch 9200, loss[loss=0.1371, simple_loss=0.2195, pruned_loss=0.0273, over 12272.00 frames. ], tot_loss[loss=0.1613, simple_loss=0.2512, pruned_loss=0.03567, over 2381540.03 frames. ], batch size: 28, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:05:21,428 INFO [finetune.py:992] (1/2) Epoch 20, batch 9250, loss[loss=0.1881, simple_loss=0.2741, pruned_loss=0.0511, over 12044.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.2517, pruned_loss=0.03584, over 2380242.20 frames. ], batch size: 37, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:05:28,935 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.941e+02 2.659e+02 3.200e+02 3.746e+02 7.852e+02, threshold=6.401e+02, percent-clipped=2.0 2023-05-19 03:05:50,078 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.3957, 4.8150, 3.1722, 2.7723, 4.1200, 2.9348, 4.0576, 3.4829], device='cuda:1'), covar=tensor([0.0793, 0.0627, 0.1111, 0.1515, 0.0362, 0.1284, 0.0598, 0.0775], device='cuda:1'), in_proj_covar=tensor([0.0193, 0.0267, 0.0182, 0.0205, 0.0149, 0.0187, 0.0205, 0.0180], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 03:05:56,115 INFO [finetune.py:992] (1/2) Epoch 20, batch 9300, loss[loss=0.1614, simple_loss=0.2578, pruned_loss=0.03248, over 12202.00 frames. ], tot_loss[loss=0.1613, simple_loss=0.2514, pruned_loss=0.03554, over 2378866.96 frames. ], batch size: 35, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:06:22,818 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.31 vs. limit=2.0 2023-05-19 03:06:31,479 INFO [finetune.py:992] (1/2) Epoch 20, batch 9350, loss[loss=0.1562, simple_loss=0.24, pruned_loss=0.03617, over 12421.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.2513, pruned_loss=0.03551, over 2375159.22 frames. ], batch size: 32, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:06:39,947 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.960e+02 2.662e+02 3.184e+02 3.616e+02 6.301e+02, threshold=6.367e+02, percent-clipped=0.0 2023-05-19 03:06:42,280 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.0956, 4.6869, 4.7694, 4.8836, 4.7813, 4.8579, 4.7487, 2.6075], device='cuda:1'), covar=tensor([0.0081, 0.0077, 0.0106, 0.0062, 0.0057, 0.0118, 0.0088, 0.0949], device='cuda:1'), in_proj_covar=tensor([0.0074, 0.0086, 0.0089, 0.0079, 0.0065, 0.0100, 0.0087, 0.0104], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 03:07:00,543 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.23 vs. limit=2.0 2023-05-19 03:07:02,484 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=341841.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:07:03,364 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=2.49 vs. limit=5.0 2023-05-19 03:07:07,273 INFO [finetune.py:992] (1/2) Epoch 20, batch 9400, loss[loss=0.1853, simple_loss=0.2769, pruned_loss=0.04683, over 11805.00 frames. ], tot_loss[loss=0.1607, simple_loss=0.2509, pruned_loss=0.03523, over 2377493.21 frames. ], batch size: 44, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:07:11,803 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.83 vs. limit=2.0 2023-05-19 03:07:24,271 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=2.90 vs. limit=5.0 2023-05-19 03:07:42,739 INFO [finetune.py:992] (1/2) Epoch 20, batch 9450, loss[loss=0.1446, simple_loss=0.2322, pruned_loss=0.02844, over 12186.00 frames. ], tot_loss[loss=0.1604, simple_loss=0.2508, pruned_loss=0.03501, over 2376546.34 frames. ], batch size: 31, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:07:45,775 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=341902.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:07:50,489 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.800e+02 2.708e+02 3.154e+02 3.636e+02 6.774e+02, threshold=6.308e+02, percent-clipped=1.0 2023-05-19 03:07:57,760 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.89 vs. limit=2.0 2023-05-19 03:08:17,665 INFO [finetune.py:992] (1/2) Epoch 20, batch 9500, loss[loss=0.2362, simple_loss=0.3098, pruned_loss=0.08126, over 8261.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2515, pruned_loss=0.03565, over 2370902.60 frames. ], batch size: 98, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:08:36,818 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=2.00 vs. limit=2.0 2023-05-19 03:08:46,042 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=341988.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:08:52,749 INFO [finetune.py:992] (1/2) Epoch 20, batch 9550, loss[loss=0.1763, simple_loss=0.2689, pruned_loss=0.04185, over 12130.00 frames. ], tot_loss[loss=0.1621, simple_loss=0.2523, pruned_loss=0.0359, over 2361988.26 frames. ], batch size: 39, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:09:03,125 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.737e+02 2.794e+02 3.289e+02 4.342e+02 9.721e+02, threshold=6.577e+02, percent-clipped=4.0 2023-05-19 03:09:30,918 INFO [finetune.py:992] (1/2) Epoch 20, batch 9600, loss[loss=0.1618, simple_loss=0.2616, pruned_loss=0.03096, over 12373.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.2522, pruned_loss=0.0358, over 2362943.56 frames. ], batch size: 38, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:09:31,834 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=342049.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:09:36,098 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.0070, 4.6798, 4.7794, 4.8471, 4.6644, 4.9285, 4.7856, 2.7063], device='cuda:1'), covar=tensor([0.0094, 0.0082, 0.0104, 0.0058, 0.0062, 0.0103, 0.0079, 0.0879], device='cuda:1'), in_proj_covar=tensor([0.0075, 0.0087, 0.0090, 0.0079, 0.0065, 0.0100, 0.0087, 0.0104], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 03:10:06,486 INFO [finetune.py:992] (1/2) Epoch 20, batch 9650, loss[loss=0.1517, simple_loss=0.2402, pruned_loss=0.03154, over 12263.00 frames. ], tot_loss[loss=0.1613, simple_loss=0.2516, pruned_loss=0.03547, over 2366408.35 frames. ], batch size: 32, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:10:07,520 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.17 vs. limit=2.0 2023-05-19 03:10:14,312 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.863e+02 2.471e+02 2.894e+02 3.503e+02 1.028e+03, threshold=5.788e+02, percent-clipped=2.0 2023-05-19 03:10:35,937 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.7322, 3.7685, 3.3574, 3.2184, 2.9106, 2.7594, 3.6957, 2.5127], device='cuda:1'), covar=tensor([0.0424, 0.0129, 0.0235, 0.0262, 0.0455, 0.0447, 0.0144, 0.0512], device='cuda:1'), in_proj_covar=tensor([0.0202, 0.0172, 0.0179, 0.0203, 0.0210, 0.0207, 0.0183, 0.0214], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 03:10:37,463 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.1370, 4.6155, 3.9228, 4.9226, 4.3130, 2.8614, 4.0987, 2.9988], device='cuda:1'), covar=tensor([0.0935, 0.0869, 0.1666, 0.0495, 0.1378, 0.1850, 0.1216, 0.3483], device='cuda:1'), in_proj_covar=tensor([0.0318, 0.0388, 0.0370, 0.0353, 0.0380, 0.0282, 0.0356, 0.0372], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 03:10:41,437 INFO [finetune.py:992] (1/2) Epoch 20, batch 9700, loss[loss=0.1799, simple_loss=0.2679, pruned_loss=0.04596, over 12038.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.2514, pruned_loss=0.03552, over 2367376.69 frames. ], batch size: 40, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:11:16,290 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=342197.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:11:16,897 INFO [finetune.py:992] (1/2) Epoch 20, batch 9750, loss[loss=0.1713, simple_loss=0.2627, pruned_loss=0.03995, over 12027.00 frames. ], tot_loss[loss=0.161, simple_loss=0.2512, pruned_loss=0.03539, over 2369253.35 frames. ], batch size: 42, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:11:24,910 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.913e+02 2.576e+02 3.145e+02 4.023e+02 1.000e+03, threshold=6.290e+02, percent-clipped=4.0 2023-05-19 03:11:48,579 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.5092, 5.4824, 5.2961, 4.7859, 4.8815, 5.4416, 5.0954, 4.8799], device='cuda:1'), covar=tensor([0.0790, 0.1053, 0.0743, 0.1698, 0.1029, 0.0798, 0.1486, 0.0922], device='cuda:1'), in_proj_covar=tensor([0.0661, 0.0601, 0.0558, 0.0677, 0.0455, 0.0780, 0.0829, 0.0595], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0004, 0.0004, 0.0002], device='cuda:1') 2023-05-19 03:11:50,667 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.8425, 2.9145, 4.7516, 4.8996, 2.9860, 2.6157, 3.1751, 2.3398], device='cuda:1'), covar=tensor([0.1604, 0.2981, 0.0395, 0.0395, 0.1311, 0.2739, 0.2695, 0.3964], device='cuda:1'), in_proj_covar=tensor([0.0320, 0.0403, 0.0288, 0.0317, 0.0290, 0.0333, 0.0415, 0.0388], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-19 03:11:52,284 INFO [finetune.py:992] (1/2) Epoch 20, batch 9800, loss[loss=0.1851, simple_loss=0.2728, pruned_loss=0.04869, over 12174.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.2515, pruned_loss=0.03594, over 2370422.52 frames. ], batch size: 35, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:12:13,510 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.96 vs. limit=2.0 2023-05-19 03:12:27,139 INFO [finetune.py:992] (1/2) Epoch 20, batch 9850, loss[loss=0.1642, simple_loss=0.2604, pruned_loss=0.03402, over 12353.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.2511, pruned_loss=0.03563, over 2378079.02 frames. ], batch size: 36, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:12:35,021 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.707e+02 2.626e+02 3.190e+02 3.767e+02 5.827e+02, threshold=6.380e+02, percent-clipped=0.0 2023-05-19 03:12:36,502 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=342311.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 03:13:00,206 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=342344.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:13:02,727 INFO [finetune.py:992] (1/2) Epoch 20, batch 9900, loss[loss=0.1472, simple_loss=0.2427, pruned_loss=0.02587, over 12269.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.2516, pruned_loss=0.03579, over 2365429.69 frames. ], batch size: 37, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:13:19,573 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=342372.0, num_to_drop=1, layers_to_drop={3} 2023-05-19 03:13:27,878 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.1388, 4.8441, 5.1195, 5.1124, 4.3220, 4.4693, 4.5752, 4.8605], device='cuda:1'), covar=tensor([0.1071, 0.1312, 0.1058, 0.0982, 0.3779, 0.2348, 0.0872, 0.1926], device='cuda:1'), in_proj_covar=tensor([0.0588, 0.0777, 0.0671, 0.0688, 0.0927, 0.0805, 0.0615, 0.0520], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0003], device='cuda:1') 2023-05-19 03:13:38,033 INFO [finetune.py:992] (1/2) Epoch 20, batch 9950, loss[loss=0.1685, simple_loss=0.2551, pruned_loss=0.04098, over 12080.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2531, pruned_loss=0.03604, over 2365338.85 frames. ], batch size: 32, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:13:43,557 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.25 vs. limit=2.0 2023-05-19 03:13:46,015 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.997e+02 2.702e+02 3.172e+02 3.899e+02 3.372e+03, threshold=6.344e+02, percent-clipped=3.0 2023-05-19 03:13:47,680 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=2.45 vs. limit=5.0 2023-05-19 03:13:52,324 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.1656, 2.2887, 3.0039, 4.0127, 2.0773, 4.1268, 4.1350, 4.2172], device='cuda:1'), covar=tensor([0.0153, 0.1416, 0.0562, 0.0206, 0.1449, 0.0283, 0.0232, 0.0126], device='cuda:1'), in_proj_covar=tensor([0.0129, 0.0209, 0.0188, 0.0130, 0.0189, 0.0187, 0.0188, 0.0130], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 03:13:55,499 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.63 vs. limit=5.0 2023-05-19 03:14:12,607 INFO [finetune.py:992] (1/2) Epoch 20, batch 10000, loss[loss=0.1726, simple_loss=0.2641, pruned_loss=0.04057, over 12031.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2531, pruned_loss=0.036, over 2371446.33 frames. ], batch size: 40, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:14:43,191 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=2.69 vs. limit=5.0 2023-05-19 03:14:47,307 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=342497.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:14:47,895 INFO [finetune.py:992] (1/2) Epoch 20, batch 10050, loss[loss=0.1631, simple_loss=0.2545, pruned_loss=0.03584, over 12198.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.2521, pruned_loss=0.03571, over 2375629.72 frames. ], batch size: 35, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:14:55,631 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.947e+02 2.629e+02 3.145e+02 3.868e+02 7.423e+02, threshold=6.290e+02, percent-clipped=1.0 2023-05-19 03:15:14,320 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=342535.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:15:17,670 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=342540.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:15:21,108 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=342545.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:15:23,110 INFO [finetune.py:992] (1/2) Epoch 20, batch 10100, loss[loss=0.1738, simple_loss=0.2743, pruned_loss=0.03665, over 11616.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2515, pruned_loss=0.0356, over 2371356.20 frames. ], batch size: 48, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:15:39,577 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.37 vs. limit=2.0 2023-05-19 03:15:42,880 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.3742, 4.7255, 4.1585, 5.0780, 4.6228, 3.0409, 4.1545, 3.1091], device='cuda:1'), covar=tensor([0.0833, 0.0784, 0.1397, 0.0478, 0.1135, 0.1770, 0.1195, 0.3269], device='cuda:1'), in_proj_covar=tensor([0.0322, 0.0393, 0.0374, 0.0357, 0.0385, 0.0286, 0.0361, 0.0376], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 03:15:56,381 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=342596.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:15:57,539 INFO [finetune.py:992] (1/2) Epoch 20, batch 10150, loss[loss=0.1979, simple_loss=0.2823, pruned_loss=0.05678, over 8389.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.2511, pruned_loss=0.03564, over 2369906.96 frames. ], batch size: 98, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:16:00,053 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=342601.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:16:05,440 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.918e+02 2.782e+02 3.170e+02 3.709e+02 6.976e+02, threshold=6.340e+02, percent-clipped=1.0 2023-05-19 03:16:30,139 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=342644.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:16:32,840 INFO [finetune.py:992] (1/2) Epoch 20, batch 10200, loss[loss=0.2107, simple_loss=0.2986, pruned_loss=0.06146, over 7754.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.2521, pruned_loss=0.03565, over 2368120.14 frames. ], batch size: 99, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:16:38,281 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0 2023-05-19 03:16:46,101 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=342667.0, num_to_drop=1, layers_to_drop={3} 2023-05-19 03:17:02,306 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.8763, 3.5714, 5.3066, 2.8392, 2.9765, 3.8277, 3.3699, 3.8889], device='cuda:1'), covar=tensor([0.0463, 0.1152, 0.0281, 0.1243, 0.2010, 0.1657, 0.1410, 0.1178], device='cuda:1'), in_proj_covar=tensor([0.0249, 0.0247, 0.0274, 0.0194, 0.0246, 0.0305, 0.0236, 0.0281], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 03:17:04,230 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=342692.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:17:05,093 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=342693.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:17:08,503 INFO [finetune.py:992] (1/2) Epoch 20, batch 10250, loss[loss=0.1507, simple_loss=0.2396, pruned_loss=0.03093, over 12364.00 frames. ], tot_loss[loss=0.1608, simple_loss=0.2507, pruned_loss=0.03549, over 2372725.61 frames. ], batch size: 30, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:17:16,065 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.825e+02 2.540e+02 2.941e+02 3.617e+02 1.365e+03, threshold=5.881e+02, percent-clipped=2.0 2023-05-19 03:17:43,273 INFO [finetune.py:992] (1/2) Epoch 20, batch 10300, loss[loss=0.2241, simple_loss=0.3028, pruned_loss=0.07271, over 7994.00 frames. ], tot_loss[loss=0.1606, simple_loss=0.2504, pruned_loss=0.03538, over 2374417.89 frames. ], batch size: 98, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:17:47,651 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=342754.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:18:08,656 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.88 vs. limit=2.0 2023-05-19 03:18:18,768 INFO [finetune.py:992] (1/2) Epoch 20, batch 10350, loss[loss=0.1709, simple_loss=0.2673, pruned_loss=0.03728, over 11210.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.2513, pruned_loss=0.03551, over 2366476.57 frames. ], batch size: 55, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:18:26,861 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.836e+02 3.182e+02 3.621e+02 7.914e+02, threshold=6.363e+02, percent-clipped=3.0 2023-05-19 03:18:29,835 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=342813.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 03:18:54,009 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.2788, 5.1129, 5.1295, 5.1761, 4.7607, 5.2459, 5.3641, 5.4277], device='cuda:1'), covar=tensor([0.0196, 0.0179, 0.0168, 0.0350, 0.0748, 0.0310, 0.0132, 0.0175], device='cuda:1'), in_proj_covar=tensor([0.0210, 0.0210, 0.0204, 0.0263, 0.0253, 0.0237, 0.0190, 0.0249], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:1') 2023-05-19 03:18:54,497 INFO [finetune.py:992] (1/2) Epoch 20, batch 10400, loss[loss=0.1537, simple_loss=0.2275, pruned_loss=0.04, over 11834.00 frames. ], tot_loss[loss=0.1605, simple_loss=0.2508, pruned_loss=0.03507, over 2366944.67 frames. ], batch size: 26, lr: 3.06e-03, grad_scale: 8.0 2023-05-19 03:18:54,764 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.7057, 3.2305, 5.1994, 2.6602, 2.8833, 3.7787, 3.1370, 3.8793], device='cuda:1'), covar=tensor([0.0444, 0.1309, 0.0338, 0.1263, 0.1984, 0.1634, 0.1514, 0.1150], device='cuda:1'), in_proj_covar=tensor([0.0249, 0.0248, 0.0274, 0.0194, 0.0247, 0.0306, 0.0237, 0.0281], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 03:19:12,854 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=342874.0, num_to_drop=1, layers_to_drop={3} 2023-05-19 03:19:24,558 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=342891.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:19:27,990 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=342896.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:19:29,315 INFO [finetune.py:992] (1/2) Epoch 20, batch 10450, loss[loss=0.1449, simple_loss=0.2388, pruned_loss=0.02552, over 12341.00 frames. ], tot_loss[loss=0.1603, simple_loss=0.251, pruned_loss=0.03476, over 2378956.66 frames. ], batch size: 31, lr: 3.06e-03, grad_scale: 16.0 2023-05-19 03:19:36,974 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.491e+02 2.496e+02 2.961e+02 3.391e+02 1.096e+03, threshold=5.923e+02, percent-clipped=0.0 2023-05-19 03:19:38,756 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.28 vs. limit=2.0 2023-05-19 03:19:52,278 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.1770, 2.4764, 3.6424, 3.1061, 3.5225, 3.2050, 2.6688, 3.6000], device='cuda:1'), covar=tensor([0.0154, 0.0446, 0.0175, 0.0320, 0.0171, 0.0227, 0.0434, 0.0168], device='cuda:1'), in_proj_covar=tensor([0.0201, 0.0222, 0.0212, 0.0207, 0.0241, 0.0184, 0.0215, 0.0213], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 03:20:04,783 INFO [finetune.py:992] (1/2) Epoch 20, batch 10500, loss[loss=0.14, simple_loss=0.2185, pruned_loss=0.03079, over 12159.00 frames. ], tot_loss[loss=0.1609, simple_loss=0.2516, pruned_loss=0.03509, over 2378444.26 frames. ], batch size: 29, lr: 3.06e-03, grad_scale: 16.0 2023-05-19 03:20:18,628 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=342967.0, num_to_drop=1, layers_to_drop={2} 2023-05-19 03:20:39,823 INFO [finetune.py:992] (1/2) Epoch 20, batch 10550, loss[loss=0.149, simple_loss=0.2438, pruned_loss=0.02708, over 10581.00 frames. ], tot_loss[loss=0.1615, simple_loss=0.252, pruned_loss=0.03545, over 2380826.03 frames. ], batch size: 68, lr: 3.06e-03, grad_scale: 16.0 2023-05-19 03:20:47,693 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.586e+02 3.115e+02 3.687e+02 6.303e+02, threshold=6.229e+02, percent-clipped=2.0 2023-05-19 03:20:52,061 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=343015.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 03:21:04,848 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=2.84 vs. limit=5.0 2023-05-19 03:21:05,431 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.4325, 2.3609, 3.0903, 4.2856, 2.5044, 4.2954, 4.4070, 4.5145], device='cuda:1'), covar=tensor([0.0170, 0.1520, 0.0609, 0.0187, 0.1274, 0.0268, 0.0221, 0.0109], device='cuda:1'), in_proj_covar=tensor([0.0131, 0.0210, 0.0189, 0.0130, 0.0191, 0.0188, 0.0189, 0.0130], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0002], device='cuda:1') 2023-05-19 03:21:13,087 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([5.5553, 5.0726, 5.5459, 4.8684, 5.1857, 4.9521, 5.5768, 5.1342], device='cuda:1'), covar=tensor([0.0261, 0.0411, 0.0273, 0.0251, 0.0399, 0.0366, 0.0190, 0.0327], device='cuda:1'), in_proj_covar=tensor([0.0293, 0.0292, 0.0320, 0.0289, 0.0286, 0.0289, 0.0266, 0.0240], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 03:21:15,062 INFO [finetune.py:992] (1/2) Epoch 20, batch 10600, loss[loss=0.1703, simple_loss=0.2559, pruned_loss=0.04233, over 12107.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2517, pruned_loss=0.03554, over 2382934.80 frames. ], batch size: 33, lr: 3.06e-03, grad_scale: 16.0 2023-05-19 03:21:15,889 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=343049.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:21:18,086 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.0181, 4.5246, 4.0477, 4.8084, 4.3676, 2.8565, 3.9453, 2.8926], device='cuda:1'), covar=tensor([0.0948, 0.0753, 0.1362, 0.0546, 0.1175, 0.1769, 0.1216, 0.3412], device='cuda:1'), in_proj_covar=tensor([0.0320, 0.0392, 0.0372, 0.0356, 0.0384, 0.0285, 0.0360, 0.0375], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 03:21:28,983 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=343068.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:21:51,285 INFO [finetune.py:992] (1/2) Epoch 20, batch 10650, loss[loss=0.2534, simple_loss=0.3343, pruned_loss=0.08619, over 8343.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.253, pruned_loss=0.03626, over 2361367.70 frames. ], batch size: 97, lr: 3.06e-03, grad_scale: 16.0 2023-05-19 03:21:55,148 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.19 vs. limit=2.0 2023-05-19 03:21:58,825 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.742e+02 2.486e+02 2.923e+02 3.502e+02 1.456e+03, threshold=5.845e+02, percent-clipped=2.0 2023-05-19 03:22:13,796 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=343129.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:22:23,708 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.81 vs. limit=2.0 2023-05-19 03:22:26,636 INFO [finetune.py:992] (1/2) Epoch 20, batch 10700, loss[loss=0.1752, simple_loss=0.2686, pruned_loss=0.04095, over 12344.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.2529, pruned_loss=0.03594, over 2370576.34 frames. ], batch size: 36, lr: 3.06e-03, grad_scale: 16.0 2023-05-19 03:22:41,206 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=343169.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 03:22:56,354 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=343191.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:23:00,020 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=343196.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:23:01,316 INFO [finetune.py:992] (1/2) Epoch 20, batch 10750, loss[loss=0.1603, simple_loss=0.249, pruned_loss=0.03576, over 12048.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.2523, pruned_loss=0.03552, over 2377848.01 frames. ], batch size: 31, lr: 3.06e-03, grad_scale: 16.0 2023-05-19 03:23:09,307 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.061e+02 2.782e+02 3.148e+02 3.630e+02 1.010e+03, threshold=6.296e+02, percent-clipped=3.0 2023-05-19 03:23:30,563 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=343239.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:23:34,102 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=343244.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:23:36,797 INFO [finetune.py:992] (1/2) Epoch 20, batch 10800, loss[loss=0.1466, simple_loss=0.2272, pruned_loss=0.03297, over 11996.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.2516, pruned_loss=0.03563, over 2376900.06 frames. ], batch size: 28, lr: 3.06e-03, grad_scale: 16.0 2023-05-19 03:23:42,830 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.17 vs. limit=2.0 2023-05-19 03:23:53,430 INFO [scaling.py:679] (1/2) Whitening: num_groups=1, num_channels=384, metric=4.83 vs. limit=5.0 2023-05-19 03:23:58,153 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.37 vs. limit=2.0 2023-05-19 03:24:12,047 INFO [finetune.py:992] (1/2) Epoch 20, batch 10850, loss[loss=0.1779, simple_loss=0.2676, pruned_loss=0.04411, over 12064.00 frames. ], tot_loss[loss=0.1613, simple_loss=0.2519, pruned_loss=0.03539, over 2378046.31 frames. ], batch size: 40, lr: 3.06e-03, grad_scale: 16.0 2023-05-19 03:24:19,436 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.628e+02 2.620e+02 3.050e+02 3.784e+02 5.779e+02, threshold=6.100e+02, percent-clipped=0.0 2023-05-19 03:24:47,526 INFO [finetune.py:992] (1/2) Epoch 20, batch 10900, loss[loss=0.159, simple_loss=0.2535, pruned_loss=0.03221, over 11666.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.2521, pruned_loss=0.03569, over 2368457.67 frames. ], batch size: 48, lr: 3.06e-03, grad_scale: 16.0 2023-05-19 03:24:48,381 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=343349.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:24:54,101 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.12 vs. limit=2.0 2023-05-19 03:25:21,806 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=343397.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:25:22,437 INFO [finetune.py:992] (1/2) Epoch 20, batch 10950, loss[loss=0.1466, simple_loss=0.2257, pruned_loss=0.03372, over 12264.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.2523, pruned_loss=0.03598, over 2372955.70 frames. ], batch size: 28, lr: 3.06e-03, grad_scale: 16.0 2023-05-19 03:25:30,884 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.725e+02 2.591e+02 3.071e+02 3.722e+02 9.742e+02, threshold=6.142e+02, percent-clipped=3.0 2023-05-19 03:25:41,167 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=343424.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:25:57,585 INFO [finetune.py:992] (1/2) Epoch 20, batch 11000, loss[loss=0.2201, simple_loss=0.2987, pruned_loss=0.07078, over 8384.00 frames. ], tot_loss[loss=0.1646, simple_loss=0.2546, pruned_loss=0.03733, over 2347761.52 frames. ], batch size: 99, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:26:12,360 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=343469.0, num_to_drop=1, layers_to_drop={1} 2023-05-19 03:26:32,802 INFO [finetune.py:992] (1/2) Epoch 20, batch 11050, loss[loss=0.1732, simple_loss=0.2716, pruned_loss=0.03736, over 12345.00 frames. ], tot_loss[loss=0.1677, simple_loss=0.2572, pruned_loss=0.03909, over 2305094.61 frames. ], batch size: 36, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:26:40,287 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.135e+02 3.090e+02 3.545e+02 4.332e+02 7.495e+02, threshold=7.090e+02, percent-clipped=4.0 2023-05-19 03:26:46,607 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=343517.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 03:27:03,107 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.7515, 4.3123, 4.5665, 4.6568, 4.4500, 4.6499, 4.6350, 2.6986], device='cuda:1'), covar=tensor([0.0099, 0.0115, 0.0111, 0.0070, 0.0074, 0.0127, 0.0079, 0.0844], device='cuda:1'), in_proj_covar=tensor([0.0075, 0.0087, 0.0091, 0.0080, 0.0066, 0.0102, 0.0088, 0.0105], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:1') 2023-05-19 03:27:07,655 INFO [finetune.py:992] (1/2) Epoch 20, batch 11100, loss[loss=0.1646, simple_loss=0.2629, pruned_loss=0.03313, over 12296.00 frames. ], tot_loss[loss=0.1697, simple_loss=0.2592, pruned_loss=0.04011, over 2270529.60 frames. ], batch size: 34, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:27:42,224 INFO [finetune.py:992] (1/2) Epoch 20, batch 11150, loss[loss=0.1871, simple_loss=0.2819, pruned_loss=0.04619, over 12041.00 frames. ], tot_loss[loss=0.1763, simple_loss=0.2653, pruned_loss=0.04364, over 2204228.10 frames. ], batch size: 45, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:27:50,618 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.181e+02 3.272e+02 3.831e+02 4.519e+02 7.897e+02, threshold=7.662e+02, percent-clipped=5.0 2023-05-19 03:28:03,971 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.1634, 4.5695, 4.0683, 4.8838, 4.4995, 2.8902, 4.0411, 3.0550], device='cuda:1'), covar=tensor([0.0902, 0.0813, 0.1432, 0.0570, 0.1230, 0.1810, 0.1170, 0.3555], device='cuda:1'), in_proj_covar=tensor([0.0317, 0.0386, 0.0368, 0.0351, 0.0380, 0.0281, 0.0355, 0.0370], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 03:28:15,338 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.18 vs. limit=2.0 2023-05-19 03:28:16,986 INFO [finetune.py:992] (1/2) Epoch 20, batch 11200, loss[loss=0.2957, simple_loss=0.3553, pruned_loss=0.1181, over 6720.00 frames. ], tot_loss[loss=0.1853, simple_loss=0.2735, pruned_loss=0.04854, over 2129886.14 frames. ], batch size: 101, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:28:51,746 INFO [finetune.py:992] (1/2) Epoch 20, batch 11250, loss[loss=0.2923, simple_loss=0.3563, pruned_loss=0.1142, over 6462.00 frames. ], tot_loss[loss=0.1911, simple_loss=0.2785, pruned_loss=0.05182, over 2081074.84 frames. ], batch size: 98, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:28:58,901 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.331e+02 3.392e+02 3.781e+02 4.930e+02 8.926e+02, threshold=7.563e+02, percent-clipped=5.0 2023-05-19 03:29:09,070 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=343724.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:29:13,905 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.29 vs. limit=2.0 2023-05-19 03:29:17,821 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.23 vs. limit=2.0 2023-05-19 03:29:25,763 INFO [finetune.py:992] (1/2) Epoch 20, batch 11300, loss[loss=0.2349, simple_loss=0.3168, pruned_loss=0.07646, over 6944.00 frames. ], tot_loss[loss=0.199, simple_loss=0.2852, pruned_loss=0.05633, over 2002964.27 frames. ], batch size: 98, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:29:41,945 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=343772.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:29:52,856 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.8385, 3.7086, 3.8193, 3.5155, 3.7570, 3.5069, 3.7956, 3.6187], device='cuda:1'), covar=tensor([0.0580, 0.0544, 0.0551, 0.0375, 0.0496, 0.0523, 0.0533, 0.1587], device='cuda:1'), in_proj_covar=tensor([0.0284, 0.0283, 0.0311, 0.0280, 0.0279, 0.0281, 0.0260, 0.0232], device='cuda:1'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 03:29:59,082 INFO [finetune.py:992] (1/2) Epoch 20, batch 11350, loss[loss=0.2516, simple_loss=0.3343, pruned_loss=0.08443, over 6787.00 frames. ], tot_loss[loss=0.2032, simple_loss=0.2895, pruned_loss=0.05846, over 1965451.70 frames. ], batch size: 98, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:30:07,467 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.355e+02 3.459e+02 3.988e+02 4.868e+02 7.641e+02, threshold=7.976e+02, percent-clipped=1.0 2023-05-19 03:30:28,344 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0 2023-05-19 03:30:33,786 INFO [finetune.py:992] (1/2) Epoch 20, batch 11400, loss[loss=0.1704, simple_loss=0.261, pruned_loss=0.03995, over 12085.00 frames. ], tot_loss[loss=0.2067, simple_loss=0.2925, pruned_loss=0.06044, over 1938728.98 frames. ], batch size: 32, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:30:46,481 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.23 vs. limit=2.0 2023-05-19 03:31:01,126 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=343888.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:31:07,518 INFO [finetune.py:992] (1/2) Epoch 20, batch 11450, loss[loss=0.2018, simple_loss=0.2886, pruned_loss=0.05744, over 11775.00 frames. ], tot_loss[loss=0.211, simple_loss=0.2959, pruned_loss=0.06302, over 1905517.12 frames. ], batch size: 44, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:31:14,913 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.474e+02 3.415e+02 3.844e+02 4.424e+02 1.075e+03, threshold=7.689e+02, percent-clipped=3.0 2023-05-19 03:31:41,722 INFO [finetune.py:992] (1/2) Epoch 20, batch 11500, loss[loss=0.2013, simple_loss=0.2912, pruned_loss=0.0557, over 10349.00 frames. ], tot_loss[loss=0.2147, simple_loss=0.2986, pruned_loss=0.06544, over 1857420.38 frames. ], batch size: 69, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:31:42,568 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=343949.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:31:46,394 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.3727, 4.3247, 4.2492, 3.8470, 3.9247, 4.3208, 4.0649, 3.9149], device='cuda:1'), covar=tensor([0.0932, 0.1110, 0.0744, 0.1489, 0.2670, 0.0957, 0.1472, 0.1205], device='cuda:1'), in_proj_covar=tensor([0.0632, 0.0575, 0.0533, 0.0643, 0.0436, 0.0748, 0.0787, 0.0569], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002], device='cuda:1') 2023-05-19 03:32:08,540 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.20 vs. limit=2.0 2023-05-19 03:32:15,439 INFO [finetune.py:992] (1/2) Epoch 20, batch 11550, loss[loss=0.1879, simple_loss=0.2754, pruned_loss=0.05022, over 11533.00 frames. ], tot_loss[loss=0.2168, simple_loss=0.3001, pruned_loss=0.06674, over 1836752.58 frames. ], batch size: 48, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:32:25,562 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.374e+02 3.444e+02 3.824e+02 4.615e+02 8.446e+02, threshold=7.648e+02, percent-clipped=4.0 2023-05-19 03:32:41,563 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.89 vs. limit=2.0 2023-05-19 03:32:51,706 INFO [finetune.py:992] (1/2) Epoch 20, batch 11600, loss[loss=0.1958, simple_loss=0.2848, pruned_loss=0.05337, over 10354.00 frames. ], tot_loss[loss=0.2176, simple_loss=0.3003, pruned_loss=0.06747, over 1825364.75 frames. ], batch size: 68, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:32:56,195 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.33 vs. limit=2.0 2023-05-19 03:33:14,859 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.89 vs. limit=2.0 2023-05-19 03:33:27,056 INFO [finetune.py:992] (1/2) Epoch 20, batch 11650, loss[loss=0.2019, simple_loss=0.2883, pruned_loss=0.05778, over 11103.00 frames. ], tot_loss[loss=0.2189, simple_loss=0.3009, pruned_loss=0.06846, over 1796324.33 frames. ], batch size: 55, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:33:34,721 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.428e+02 3.379e+02 3.969e+02 4.667e+02 7.645e+02, threshold=7.937e+02, percent-clipped=0.0 2023-05-19 03:34:01,213 INFO [finetune.py:992] (1/2) Epoch 20, batch 11700, loss[loss=0.2278, simple_loss=0.3145, pruned_loss=0.07052, over 10419.00 frames. ], tot_loss[loss=0.2203, simple_loss=0.3013, pruned_loss=0.06969, over 1771397.87 frames. ], batch size: 68, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:34:16,929 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=344172.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:34:18,183 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.6405, 4.3514, 4.6076, 4.1985, 4.4143, 4.2430, 4.6243, 4.2782], device='cuda:1'), covar=tensor([0.0394, 0.0444, 0.0374, 0.0275, 0.0469, 0.0382, 0.0272, 0.0678], device='cuda:1'), in_proj_covar=tensor([0.0277, 0.0276, 0.0303, 0.0274, 0.0272, 0.0274, 0.0254, 0.0226], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 03:34:35,122 INFO [finetune.py:992] (1/2) Epoch 20, batch 11750, loss[loss=0.2419, simple_loss=0.3073, pruned_loss=0.08826, over 7020.00 frames. ], tot_loss[loss=0.2219, simple_loss=0.3022, pruned_loss=0.07081, over 1732881.70 frames. ], batch size: 100, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:34:42,390 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.504e+02 3.507e+02 3.969e+02 4.748e+02 1.022e+03, threshold=7.938e+02, percent-clipped=5.0 2023-05-19 03:34:58,257 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=344233.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:35:05,837 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=344244.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:35:08,360 INFO [finetune.py:992] (1/2) Epoch 20, batch 11800, loss[loss=0.1933, simple_loss=0.2823, pruned_loss=0.05214, over 12086.00 frames. ], tot_loss[loss=0.2238, simple_loss=0.3036, pruned_loss=0.07202, over 1714160.79 frames. ], batch size: 42, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:35:25,184 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.15 vs. limit=2.0 2023-05-19 03:35:25,668 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.8326, 2.2289, 2.6466, 2.9116, 2.1860, 2.9828, 2.8981, 3.0307], device='cuda:1'), covar=tensor([0.0246, 0.1149, 0.0523, 0.0243, 0.1113, 0.0318, 0.0388, 0.0183], device='cuda:1'), in_proj_covar=tensor([0.0125, 0.0204, 0.0183, 0.0125, 0.0185, 0.0180, 0.0181, 0.0126], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 03:35:41,984 INFO [finetune.py:992] (1/2) Epoch 20, batch 11850, loss[loss=0.2555, simple_loss=0.3343, pruned_loss=0.08834, over 6457.00 frames. ], tot_loss[loss=0.2254, simple_loss=0.3052, pruned_loss=0.07276, over 1690962.08 frames. ], batch size: 98, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:35:49,730 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.265e+02 3.358e+02 3.867e+02 4.638e+02 9.968e+02, threshold=7.734e+02, percent-clipped=3.0 2023-05-19 03:36:03,573 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.8850, 4.5036, 4.2035, 4.2080, 4.5629, 3.9841, 4.1484, 4.0093], device='cuda:1'), covar=tensor([0.1562, 0.1055, 0.1232, 0.1629, 0.0920, 0.1924, 0.1670, 0.1351], device='cuda:1'), in_proj_covar=tensor([0.0361, 0.0504, 0.0410, 0.0446, 0.0467, 0.0441, 0.0401, 0.0386], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 03:36:15,896 INFO [finetune.py:992] (1/2) Epoch 20, batch 11900, loss[loss=0.2072, simple_loss=0.3128, pruned_loss=0.05074, over 11617.00 frames. ], tot_loss[loss=0.2243, simple_loss=0.305, pruned_loss=0.07182, over 1678143.91 frames. ], batch size: 48, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:36:17,381 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=344350.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:36:18,737 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.8847, 3.1524, 2.3933, 2.1802, 2.8604, 2.4083, 3.0300, 2.6456], device='cuda:1'), covar=tensor([0.0598, 0.0531, 0.1031, 0.1580, 0.0299, 0.1144, 0.0502, 0.0821], device='cuda:1'), in_proj_covar=tensor([0.0183, 0.0251, 0.0174, 0.0197, 0.0142, 0.0179, 0.0194, 0.0173], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 03:36:49,636 INFO [finetune.py:992] (1/2) Epoch 20, batch 11950, loss[loss=0.2315, simple_loss=0.3247, pruned_loss=0.06917, over 7082.00 frames. ], tot_loss[loss=0.2203, simple_loss=0.3022, pruned_loss=0.06917, over 1668984.53 frames. ], batch size: 97, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:36:57,298 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.020e+02 3.070e+02 3.404e+02 4.045e+02 7.617e+02, threshold=6.808e+02, percent-clipped=0.0 2023-05-19 03:36:58,833 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=344411.0, num_to_drop=1, layers_to_drop={2} 2023-05-19 03:37:02,103 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.8614, 2.5465, 3.5723, 3.5963, 2.9596, 2.6814, 2.6490, 2.4996], device='cuda:1'), covar=tensor([0.1428, 0.2606, 0.0600, 0.0551, 0.1046, 0.2438, 0.3050, 0.3673], device='cuda:1'), in_proj_covar=tensor([0.0314, 0.0392, 0.0281, 0.0308, 0.0282, 0.0327, 0.0407, 0.0381], device='cuda:1'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:1') 2023-05-19 03:37:03,042 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.44 vs. limit=2.0 2023-05-19 03:37:12,475 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.7125, 2.5167, 2.8970, 3.6410, 2.2771, 3.7734, 3.7603, 3.8227], device='cuda:1'), covar=tensor([0.0174, 0.1199, 0.0532, 0.0204, 0.1380, 0.0316, 0.0237, 0.0156], device='cuda:1'), in_proj_covar=tensor([0.0124, 0.0204, 0.0183, 0.0124, 0.0185, 0.0179, 0.0180, 0.0125], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 03:37:24,243 INFO [finetune.py:992] (1/2) Epoch 20, batch 12000, loss[loss=0.1937, simple_loss=0.2792, pruned_loss=0.05411, over 11058.00 frames. ], tot_loss[loss=0.2146, simple_loss=0.2979, pruned_loss=0.06571, over 1685589.55 frames. ], batch size: 55, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:37:24,243 INFO [finetune.py:1017] (1/2) Computing validation loss 2023-05-19 03:37:30,195 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([4.6948, 4.6137, 4.7456, 4.7192, 4.4387, 4.3991, 4.3445, 4.4938], device='cuda:1'), covar=tensor([0.0773, 0.0648, 0.0710, 0.0607, 0.1843, 0.1809, 0.0621, 0.1342], device='cuda:1'), in_proj_covar=tensor([0.0536, 0.0707, 0.0616, 0.0628, 0.0834, 0.0731, 0.0561, 0.0475], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0003], device='cuda:1') 2023-05-19 03:37:33,031 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([1.6891, 3.3998, 3.6451, 3.9046, 2.5690, 3.4647, 2.1783, 3.2909], device='cuda:1'), covar=tensor([0.2418, 0.1299, 0.1015, 0.0540, 0.1738, 0.0987, 0.2744, 0.1057], device='cuda:1'), in_proj_covar=tensor([0.0233, 0.0274, 0.0299, 0.0361, 0.0244, 0.0244, 0.0263, 0.0369], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:1') 2023-05-19 03:37:41,837 INFO [finetune.py:1026] (1/2) Epoch 20, validation: loss=0.2856, simple_loss=0.3597, pruned_loss=0.1057, over 1020973.00 frames. 2023-05-19 03:37:41,838 INFO [finetune.py:1027] (1/2) Maximum memory allocated so far is 12856MB 2023-05-19 03:37:49,478 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=192, metric=1.68 vs. limit=2.0 2023-05-19 03:38:13,168 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.7833, 3.0472, 2.4087, 2.3089, 2.7754, 2.3947, 2.9679, 2.6469], device='cuda:1'), covar=tensor([0.0626, 0.0556, 0.0982, 0.1366, 0.0314, 0.1181, 0.0478, 0.0797], device='cuda:1'), in_proj_covar=tensor([0.0182, 0.0249, 0.0172, 0.0196, 0.0140, 0.0178, 0.0192, 0.0172], device='cuda:1'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:1') 2023-05-19 03:38:15,499 INFO [finetune.py:992] (1/2) Epoch 20, batch 12050, loss[loss=0.2008, simple_loss=0.2839, pruned_loss=0.05886, over 7112.00 frames. ], tot_loss[loss=0.2098, simple_loss=0.2938, pruned_loss=0.0629, over 1684992.53 frames. ], batch size: 97, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:38:22,541 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.838e+02 2.796e+02 3.485e+02 4.183e+02 9.162e+02, threshold=6.969e+02, percent-clipped=1.0 2023-05-19 03:38:27,138 INFO [scaling.py:679] (1/2) Whitening: num_groups=8, num_channels=96, metric=1.14 vs. limit=2.0 2023-05-19 03:38:34,562 INFO [zipformer.py:625] (1/2) warmup_begin=1333.3, warmup_end=2000.0, batch_count=344528.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:38:44,379 INFO [zipformer.py:625] (1/2) warmup_begin=2000.0, warmup_end=2666.7, batch_count=344544.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:38:46,873 INFO [finetune.py:992] (1/2) Epoch 20, batch 12100, loss[loss=0.1963, simple_loss=0.2856, pruned_loss=0.05348, over 12124.00 frames. ], tot_loss[loss=0.209, simple_loss=0.2934, pruned_loss=0.06229, over 1688720.14 frames. ], batch size: 38, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:38:59,467 INFO [zipformer.py:625] (1/2) warmup_begin=2666.7, warmup_end=3333.3, batch_count=344567.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:39:15,639 INFO [zipformer.py:625] (1/2) warmup_begin=666.7, warmup_end=1333.3, batch_count=344592.0, num_to_drop=0, layers_to_drop=set() 2023-05-19 03:39:19,324 INFO [finetune.py:992] (1/2) Epoch 20, batch 12150, loss[loss=0.1874, simple_loss=0.2836, pruned_loss=0.04558, over 11837.00 frames. ], tot_loss[loss=0.2097, simple_loss=0.2942, pruned_loss=0.06259, over 1693063.90 frames. ], batch size: 44, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:39:24,917 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([2.5174, 2.0425, 2.9171, 2.4460, 2.8151, 2.8156, 1.9177, 2.8787], device='cuda:1'), covar=tensor([0.0188, 0.0583, 0.0215, 0.0311, 0.0218, 0.0216, 0.0536, 0.0204], device='cuda:1'), in_proj_covar=tensor([0.0183, 0.0204, 0.0192, 0.0189, 0.0219, 0.0168, 0.0198, 0.0194], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:1') 2023-05-19 03:39:26,613 INFO [optim.py:368] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.168e+02 3.232e+02 3.821e+02 4.297e+02 9.292e+02, threshold=7.641e+02, percent-clipped=1.0 2023-05-19 03:39:38,527 INFO [zipformer.py:625] (1/2) warmup_begin=3333.3, warmup_end=4000.0, batch_count=344628.0, num_to_drop=1, layers_to_drop={0} 2023-05-19 03:39:46,426 INFO [zipformer.py:1454] (1/2) attn_weights_entropy = tensor([3.4136, 3.0672, 3.7243, 2.3196, 2.6153, 3.1147, 2.8642, 3.1389], device='cuda:1'), covar=tensor([0.0592, 0.1066, 0.0364, 0.1437, 0.1987, 0.1417, 0.1390, 0.1230], device='cuda:1'), in_proj_covar=tensor([0.0229, 0.0230, 0.0249, 0.0182, 0.0230, 0.0280, 0.0219, 0.0259], device='cuda:1'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002], device='cuda:1') 2023-05-19 03:39:50,398 INFO [finetune.py:992] (1/2) Epoch 20, batch 12200, loss[loss=0.2602, simple_loss=0.3311, pruned_loss=0.09464, over 7071.00 frames. ], tot_loss[loss=0.211, simple_loss=0.295, pruned_loss=0.06346, over 1678964.32 frames. ], batch size: 99, lr: 3.05e-03, grad_scale: 16.0 2023-05-19 03:40:11,116 INFO [finetune.py:1268] (1/2) Done!