|
2024-08-06 09:38:21,634 INFO [train_bf16.py:997] (1/4) Training started |
|
2024-08-06 09:38:21,634 INFO [train_bf16.py:1007] (1/4) Device: cuda:1 |
|
2024-08-06 09:38:21,635 INFO [train_bf16.py:1009] (1/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'ff1d435a8d3c4eaa15828a84a7240678a70539a7', 'k2-git-date': 'Fri Feb 23 01:48:38 2024', 'lhotse-version': '1.24.0.dev+git.5cae6234.dirty', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'audio_pretraining_frame_level', 'icefall-git-sha1': '66819490-dirty', 'icefall-git-date': 'Mon Aug 5 12:42:46 2024', 'icefall-path': '/star-xy/softwares/icefall_development/icefall_multi_KD_streaming', 'k2-path': '/star-xy/softwares/pyenvs/k2_cuda11/k2_cuda11/lib/python3.10/site-packages/k2/__init__.py', 'lhotse-path': '/star-xy/softwares/lhotse_development/lhotse_weighted_sampler/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-101-0520193029-75fc9fd4b6-l86sv'}, 'world_size': 4, 'master_port': 14394, 'tensorboard': True, 'num_epochs': 100, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('zipformer/exp_at_full_lr_epochs_15_specaug1_frame192_feature27_musan1_weighted1_md1000_bf16'), 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 15.0, 'ref_duration': 600, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'use_bf16': True, 'full_bf16': True, 'use_KD': False, 'use_beats': False, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'output_downsampling_factor': 2, 'num_events': 527, 'feature_dim': 128, 'audioset_subset': 'full', 'manifest_dir': PosixPath('data/fbank_as_ced_mAP50'), 'max_duration': 1000, 'weighted_sampler': True, 'num_samples': 200000, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'features_mask_size': 27, 'frames_mask_size': 192, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures'} |
|
2024-08-06 09:38:21,635 INFO [train_bf16.py:1011] (1/4) About to create model |
|
2024-08-06 09:38:26,299 INFO [train_bf16.py:1015] (1/4) Number of model parameters: 64559366 |
|
2024-08-06 09:38:26,299 INFO [train_bf16.py:1034] (1/4) Training using: torch.bfloat16 |
|
2024-08-06 09:38:31,347 INFO [train_bf16.py:1039] (1/4) Using DDP |
|
2024-08-06 09:38:33,505 INFO [at_datamodule.py:466] (1/4) About to get the audioset training cuts. |
|
2024-08-06 09:38:33,505 INFO [at_datamodule.py:486] (1/4) Start to load data/fbank_as_ced_mAP50/cuts_audioset_full-with-CED-embeddings.jsonl.gz |
|
2024-08-06 09:42:06,703 INFO [at_datamodule.py:488] (1/4) Get 1904746 cuts in total. |
|
2024-08-06 09:42:06,704 INFO [at_datamodule.py:249] (1/4) Enable MUSAN |
|
2024-08-06 09:42:06,704 INFO [at_datamodule.py:250] (1/4) About to get Musan cuts |
|
2024-08-06 09:42:08,227 INFO [at_datamodule.py:274] (1/4) Enable SpecAugment |
|
2024-08-06 09:42:08,227 INFO [at_datamodule.py:275] (1/4) Time warp factor: 80 |
|
2024-08-06 09:42:08,227 INFO [at_datamodule.py:285] (1/4) Num frame mask: 10 |
|
2024-08-06 09:42:08,227 INFO [at_datamodule.py:298] (1/4) About to create train dataset |
|
2024-08-06 09:42:08,228 INFO [at_datamodule.py:337] (1/4) Using weighted SimpleCutSampler |
|
2024-08-06 09:42:08,228 INFO [at_datamodule.py:501] (1/4) About to get the sampling weight for full in AudioSet |
|
2024-08-06 09:42:10,237 INFO [at_datamodule.py:511] (1/4) Get the sampling weight for 1904746 cuts |
|
2024-08-06 09:42:11,611 INFO [at_datamodule.py:355] (1/4) About to create train dataloader |
|
2024-08-06 09:42:11,611 INFO [at_datamodule.py:494] (1/4) About to get audioset eval all cuts |
|
2024-08-06 09:42:11,616 INFO [at_datamodule.py:386] (1/4) About to create dev dataset |
|
2024-08-06 09:42:12,176 INFO [at_datamodule.py:403] (1/4) About to create dev dataloader |
|
2024-08-06 09:43:20,462 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[248, 100, 256], will continue. |
|
2024-08-06 09:43:20,466 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[248, 100, 256], will continue. |
|
2024-08-06 09:43:20,467 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[248, 100, 192], will continue. |
|
2024-08-06 09:43:20,469 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[248, 100, 128], will continue. |
|
2024-08-06 09:43:20,472 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[248, 100, 256], will continue. |
|
2024-08-06 09:43:20,473 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[248, 100, 256], will continue. |
|
2024-08-06 09:43:20,476 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[248, 100, 256], will continue. |
|
2024-08-06 09:43:20,484 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[124, 100, 384], will continue. |
|
2024-08-06 09:43:20,486 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[124, 100, 384], will continue. |
|
2024-08-06 09:43:20,488 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[124, 100, 384], will continue. |
|
2024-08-06 09:43:20,496 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[124, 100, 384], will continue. |
|
2024-08-06 09:43:20,502 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[124, 100, 128], will continue. |
|
2024-08-06 09:43:20,508 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[124, 100, 384], will continue. |
|
2024-08-06 09:43:20,511 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[124, 100, 128], will continue. |
|
2024-08-06 09:43:20,520 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[62, 100, 512], will continue. |
|
2024-08-06 09:43:20,522 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[62, 100, 512], will continue. |
|
2024-08-06 09:43:20,525 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[62, 100, 512], will continue. |
|
2024-08-06 09:43:20,531 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[62, 100, 512], will continue. |
|
2024-08-06 09:43:20,534 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[62, 100, 512], will continue. |
|
2024-08-06 09:43:20,536 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[62, 100, 512], will continue. |
|
2024-08-06 09:43:20,540 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[62, 100, 512], will continue. |
|
2024-08-06 09:43:23,492 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[62, 100, 512], will continue. |
|
2024-08-06 09:43:23,498 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[62, 100, 384], will continue. |
|
2024-08-06 09:43:23,513 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[62, 100, 512], will continue. |
|
2024-08-06 09:43:23,524 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[124, 100, 384], will continue. |
|
2024-08-06 09:43:23,526 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[124, 100, 288], will continue. |
|
2024-08-06 09:43:23,536 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[124, 100, 384], will continue. |
|
2024-08-06 09:43:23,537 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[124, 100, 384], will continue. |
|
2024-08-06 09:43:23,540 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[124, 100, 128], will continue. |
|
2024-08-06 09:43:23,542 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[124, 100, 384], will continue. |
|
2024-08-06 09:43:23,543 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[124, 100, 384], will continue. |
|
2024-08-06 09:43:23,547 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[124, 100, 384], will continue. |
|
2024-08-06 09:43:23,550 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[124, 100, 128], will continue. |
|
2024-08-06 09:43:23,581 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[248, 100, 256], will continue. |
|
2024-08-06 09:43:23,585 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[248, 100, 256], will continue. |
|
2024-08-06 09:43:23,593 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[248, 100, 192], will continue. |
|
2024-08-06 09:43:23,597 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[248, 100, 128], will continue. |
|
2024-08-06 09:43:23,606 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[496, 100, 192], will continue. |
|
2024-08-06 09:43:23,613 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[496, 100, 192], will continue. |
|
2024-08-06 09:43:23,621 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[496, 100, 192], will continue. |
|
2024-08-06 09:43:23,623 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[496, 100, 192], will continue. |
|
2024-08-06 09:43:23,628 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[496, 100, 144], will continue. |
|
2024-08-06 09:43:23,628 INFO [scaling_bf16.py:1037] (1/4) Caught exception in Whiten backward: , size=[496, 100, 192], will continue. |
|
2024-08-06 09:43:23,874 INFO [checkpoint.py:75] (1/4) Saving checkpoint to zipformer/exp_at_full_lr_epochs_15_specaug1_frame192_feature27_musan1_weighted1_md1000_bf16/bad-model-1.pt |
|
2024-08-06 09:43:25,796 INFO [train_bf16.py:1175] (1/4) Saving batch to zipformer/exp_at_full_lr_epochs_15_specaug1_frame192_feature27_musan1_weighted1_md1000_bf16/batch-bdd640fb-0667-1ad1-1c80-317fa3b1799d.pt |
|
2024-08-06 09:43:26,244 INFO [train_bf16.py:1181] (1/4) features shape: torch.Size([100, 1000, 128]) |
|
|