2024-08-06 09:38:21,682 INFO [train_bf16.py:997] (0/4) Training started 2024-08-06 09:38:21,766 INFO [train_bf16.py:1007] (0/4) Device: cuda:0 2024-08-06 09:38:21,766 INFO [train_bf16.py:1009] (0/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'ff1d435a8d3c4eaa15828a84a7240678a70539a7', 'k2-git-date': 'Fri Feb 23 01:48:38 2024', 'lhotse-version': '1.24.0.dev+git.5cae6234.dirty', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'audio_pretraining_frame_level', 'icefall-git-sha1': '66819490-dirty', 'icefall-git-date': 'Mon Aug 5 12:42:46 2024', 'icefall-path': '/star-xy/softwares/icefall_development/icefall_multi_KD_streaming', 'k2-path': '/star-xy/softwares/pyenvs/k2_cuda11/k2_cuda11/lib/python3.10/site-packages/k2/__init__.py', 'lhotse-path': '/star-xy/softwares/lhotse_development/lhotse_weighted_sampler/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-101-0520193029-75fc9fd4b6-l86sv'}, 'world_size': 4, 'master_port': 14394, 'tensorboard': True, 'num_epochs': 100, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('zipformer/exp_at_full_lr_epochs_15_specaug1_frame192_feature27_musan1_weighted1_md1000_bf16'), 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 15.0, 'ref_duration': 600, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'use_bf16': True, 'full_bf16': True, 'use_KD': False, 'use_beats': False, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'output_downsampling_factor': 2, 'num_events': 527, 'feature_dim': 128, 'audioset_subset': 'full', 'manifest_dir': PosixPath('data/fbank_as_ced_mAP50'), 'max_duration': 1000, 'weighted_sampler': True, 'num_samples': 200000, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'features_mask_size': 27, 'frames_mask_size': 192, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures'} 2024-08-06 09:38:21,766 INFO [train_bf16.py:1011] (0/4) About to create model 2024-08-06 09:38:27,128 INFO [train_bf16.py:1015] (0/4) Number of model parameters: 64559366 2024-08-06 09:38:31,126 INFO [train_bf16.py:1034] (0/4) Training using: torch.bfloat16 2024-08-06 09:38:31,939 INFO [train_bf16.py:1039] (0/4) Using DDP 2024-08-06 09:38:33,506 INFO [at_datamodule.py:466] (0/4) About to get the audioset training cuts. 2024-08-06 09:38:33,506 INFO [at_datamodule.py:486] (0/4) Start to load data/fbank_as_ced_mAP50/cuts_audioset_full-with-CED-embeddings.jsonl.gz 2024-08-06 09:42:04,245 INFO [at_datamodule.py:488] (0/4) Get 1904746 cuts in total. 2024-08-06 09:42:04,246 INFO [at_datamodule.py:249] (0/4) Enable MUSAN 2024-08-06 09:42:04,246 INFO [at_datamodule.py:250] (0/4) About to get Musan cuts 2024-08-06 09:42:05,918 INFO [at_datamodule.py:274] (0/4) Enable SpecAugment 2024-08-06 09:42:05,919 INFO [at_datamodule.py:275] (0/4) Time warp factor: 80 2024-08-06 09:42:05,919 INFO [at_datamodule.py:285] (0/4) Num frame mask: 10 2024-08-06 09:42:05,919 INFO [at_datamodule.py:298] (0/4) About to create train dataset 2024-08-06 09:42:05,919 INFO [at_datamodule.py:337] (0/4) Using weighted SimpleCutSampler 2024-08-06 09:42:05,920 INFO [at_datamodule.py:501] (0/4) About to get the sampling weight for full in AudioSet 2024-08-06 09:42:07,951 INFO [at_datamodule.py:511] (0/4) Get the sampling weight for 1904746 cuts 2024-08-06 09:42:09,347 INFO [at_datamodule.py:355] (0/4) About to create train dataloader 2024-08-06 09:42:09,351 INFO [at_datamodule.py:494] (0/4) About to get audioset eval all cuts 2024-08-06 09:42:09,377 INFO [at_datamodule.py:386] (0/4) About to create dev dataset 2024-08-06 09:42:09,932 INFO [at_datamodule.py:403] (0/4) About to create dev dataloader 2024-08-06 09:43:20,436 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[248, 100, 256], will continue. 2024-08-06 09:43:20,439 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[248, 100, 256], will continue. 2024-08-06 09:43:20,440 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[248, 100, 192], will continue. 2024-08-06 09:43:20,441 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[248, 100, 128], will continue. 2024-08-06 09:43:20,444 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[248, 100, 256], will continue. 2024-08-06 09:43:20,445 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[248, 100, 256], will continue. 2024-08-06 09:43:20,447 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[248, 100, 256], will continue. 2024-08-06 09:43:20,454 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[124, 100, 384], will continue. 2024-08-06 09:43:20,456 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[124, 100, 384], will continue. 2024-08-06 09:43:20,458 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[124, 100, 384], will continue. 2024-08-06 09:43:20,469 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[124, 100, 384], will continue. 2024-08-06 09:43:20,474 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[124, 100, 128], will continue. 2024-08-06 09:43:20,479 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[124, 100, 384], will continue. 2024-08-06 09:43:20,483 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[124, 100, 128], will continue. 2024-08-06 09:43:20,486 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[62, 100, 512], will continue. 2024-08-06 09:43:20,488 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[62, 100, 512], will continue. 2024-08-06 09:43:20,491 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[62, 100, 512], will continue. 2024-08-06 09:43:20,497 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[62, 100, 512], will continue. 2024-08-06 09:43:20,500 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[62, 100, 512], will continue. 2024-08-06 09:43:20,502 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[62, 100, 512], will continue. 2024-08-06 09:43:20,506 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[62, 100, 512], will continue. 2024-08-06 09:43:23,492 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[62, 100, 512], will continue. 2024-08-06 09:43:23,495 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[62, 100, 384], will continue. 2024-08-06 09:43:23,504 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[62, 100, 512], will continue. 2024-08-06 09:43:23,511 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[124, 100, 384], will continue. 2024-08-06 09:43:23,513 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[124, 100, 288], will continue. 2024-08-06 09:43:23,520 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[124, 100, 384], will continue. 2024-08-06 09:43:23,521 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[124, 100, 384], will continue. 2024-08-06 09:43:23,523 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[124, 100, 128], will continue. 2024-08-06 09:43:23,524 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[124, 100, 384], will continue. 2024-08-06 09:43:23,525 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[124, 100, 384], will continue. 2024-08-06 09:43:23,528 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[124, 100, 384], will continue. 2024-08-06 09:43:23,531 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[124, 100, 128], will continue. 2024-08-06 09:43:23,541 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[248, 100, 256], will continue. 2024-08-06 09:43:23,543 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[248, 100, 256], will continue. 2024-08-06 09:43:23,547 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[248, 100, 192], will continue. 2024-08-06 09:43:23,549 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[248, 100, 128], will continue. 2024-08-06 09:43:23,553 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[496, 100, 192], will continue. 2024-08-06 09:43:23,558 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[496, 100, 192], will continue. 2024-08-06 09:43:23,565 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[496, 100, 192], will continue. 2024-08-06 09:43:23,566 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[496, 100, 192], will continue. 2024-08-06 09:43:23,570 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[496, 100, 144], will continue. 2024-08-06 09:43:23,570 INFO [scaling_bf16.py:1037] (0/4) Caught exception in Whiten backward: , size=[496, 100, 192], will continue. 2024-08-06 09:43:23,874 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp_at_full_lr_epochs_15_specaug1_frame192_feature27_musan1_weighted1_md1000_bf16/bad-model-0.pt 2024-08-06 09:43:26,278 INFO [train_bf16.py:1175] (0/4) Saving batch to zipformer/exp_at_full_lr_epochs_15_specaug1_frame192_feature27_musan1_weighted1_md1000_bf16/batch-bdd640fb-0667-1ad1-1c80-317fa3b1799d.pt 2024-08-06 09:43:26,481 INFO [train_bf16.py:1181] (0/4) features shape: torch.Size([100, 1000, 128])