|
2024-08-06 09:38:21,642 INFO [train_bf16.py:997] (2/4) Training started |
|
2024-08-06 09:38:21,643 INFO [train_bf16.py:1007] (2/4) Device: cuda:2 |
|
2024-08-06 09:38:21,643 INFO [train_bf16.py:1009] (2/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'ff1d435a8d3c4eaa15828a84a7240678a70539a7', 'k2-git-date': 'Fri Feb 23 01:48:38 2024', 'lhotse-version': '1.24.0.dev+git.5cae6234.dirty', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'audio_pretraining_frame_level', 'icefall-git-sha1': '66819490-dirty', 'icefall-git-date': 'Mon Aug 5 12:42:46 2024', 'icefall-path': '/star-xy/softwares/icefall_development/icefall_multi_KD_streaming', 'k2-path': '/star-xy/softwares/pyenvs/k2_cuda11/k2_cuda11/lib/python3.10/site-packages/k2/__init__.py', 'lhotse-path': '/star-xy/softwares/lhotse_development/lhotse_weighted_sampler/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-101-0520193029-75fc9fd4b6-l86sv'}, 'world_size': 4, 'master_port': 14394, 'tensorboard': True, 'num_epochs': 100, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('zipformer/exp_at_full_lr_epochs_15_specaug1_frame192_feature27_musan1_weighted1_md1000_bf16'), 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 15.0, 'ref_duration': 600, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'use_bf16': True, 'full_bf16': True, 'use_KD': False, 'use_beats': False, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'output_downsampling_factor': 2, 'num_events': 527, 'feature_dim': 128, 'audioset_subset': 'full', 'manifest_dir': PosixPath('data/fbank_as_ced_mAP50'), 'max_duration': 1000, 'weighted_sampler': True, 'num_samples': 200000, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'features_mask_size': 27, 'frames_mask_size': 192, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures'} |
|
2024-08-06 09:38:21,643 INFO [train_bf16.py:1011] (2/4) About to create model |
|
2024-08-06 09:38:27,365 INFO [train_bf16.py:1015] (2/4) Number of model parameters: 64559366 |
|
2024-08-06 09:38:27,365 INFO [train_bf16.py:1034] (2/4) Training using: torch.bfloat16 |
|
2024-08-06 09:38:31,668 INFO [train_bf16.py:1039] (2/4) Using DDP |
|
2024-08-06 09:38:33,505 INFO [at_datamodule.py:466] (2/4) About to get the audioset training cuts. |
|
2024-08-06 09:38:33,506 INFO [at_datamodule.py:486] (2/4) Start to load data/fbank_as_ced_mAP50/cuts_audioset_full-with-CED-embeddings.jsonl.gz |
|
2024-08-06 09:42:08,706 INFO [at_datamodule.py:488] (2/4) Get 1904746 cuts in total. |
|
2024-08-06 09:42:08,706 INFO [at_datamodule.py:249] (2/4) Enable MUSAN |
|
2024-08-06 09:42:08,706 INFO [at_datamodule.py:250] (2/4) About to get Musan cuts |
|
2024-08-06 09:42:10,385 INFO [at_datamodule.py:274] (2/4) Enable SpecAugment |
|
2024-08-06 09:42:10,385 INFO [at_datamodule.py:275] (2/4) Time warp factor: 80 |
|
2024-08-06 09:42:10,385 INFO [at_datamodule.py:285] (2/4) Num frame mask: 10 |
|
2024-08-06 09:42:10,386 INFO [at_datamodule.py:298] (2/4) About to create train dataset |
|
2024-08-06 09:42:10,386 INFO [at_datamodule.py:337] (2/4) Using weighted SimpleCutSampler |
|
2024-08-06 09:42:10,386 INFO [at_datamodule.py:501] (2/4) About to get the sampling weight for full in AudioSet |
|
2024-08-06 09:42:12,393 INFO [at_datamodule.py:511] (2/4) Get the sampling weight for 1904746 cuts |
|
2024-08-06 09:42:13,626 INFO [at_datamodule.py:355] (2/4) About to create train dataloader |
|
2024-08-06 09:42:13,627 INFO [at_datamodule.py:494] (2/4) About to get audioset eval all cuts |
|
2024-08-06 09:42:13,656 INFO [at_datamodule.py:386] (2/4) About to create dev dataset |
|
2024-08-06 09:42:14,218 INFO [at_datamodule.py:403] (2/4) About to create dev dataloader |
|
2024-08-06 09:43:23,451 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[248, 99, 256], will continue. |
|
2024-08-06 09:43:23,455 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[248, 99, 256], will continue. |
|
2024-08-06 09:43:23,456 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[248, 99, 192], will continue. |
|
2024-08-06 09:43:23,457 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[248, 99, 128], will continue. |
|
2024-08-06 09:43:23,461 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[248, 99, 256], will continue. |
|
2024-08-06 09:43:23,462 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[248, 99, 256], will continue. |
|
2024-08-06 09:43:23,465 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[248, 99, 256], will continue. |
|
2024-08-06 09:43:23,472 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[124, 99, 384], will continue. |
|
2024-08-06 09:43:23,475 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[124, 99, 384], will continue. |
|
2024-08-06 09:43:23,476 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[124, 99, 384], will continue. |
|
2024-08-06 09:43:23,485 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[124, 99, 384], will continue. |
|
2024-08-06 09:43:23,490 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[124, 99, 128], will continue. |
|
2024-08-06 09:43:23,495 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[124, 99, 384], will continue. |
|
2024-08-06 09:43:23,498 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[124, 99, 128], will continue. |
|
2024-08-06 09:43:23,501 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[62, 99, 512], will continue. |
|
2024-08-06 09:43:23,503 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[62, 99, 512], will continue. |
|
2024-08-06 09:43:23,506 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[62, 99, 512], will continue. |
|
2024-08-06 09:43:23,511 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[62, 99, 512], will continue. |
|
2024-08-06 09:43:23,514 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[62, 99, 512], will continue. |
|
2024-08-06 09:43:23,516 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[62, 99, 512], will continue. |
|
2024-08-06 09:43:23,520 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[62, 99, 512], will continue. |
|
2024-08-06 09:43:23,521 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[62, 99, 512], will continue. |
|
2024-08-06 09:43:23,524 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[62, 99, 384], will continue. |
|
2024-08-06 09:43:23,533 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[62, 99, 512], will continue. |
|
2024-08-06 09:43:23,540 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[124, 99, 384], will continue. |
|
2024-08-06 09:43:23,542 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[124, 99, 288], will continue. |
|
2024-08-06 09:43:23,549 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[124, 99, 384], will continue. |
|
2024-08-06 09:43:23,550 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[124, 99, 384], will continue. |
|
2024-08-06 09:43:23,552 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[124, 99, 128], will continue. |
|
2024-08-06 09:43:23,553 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[124, 99, 384], will continue. |
|
2024-08-06 09:43:23,555 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[124, 99, 384], will continue. |
|
2024-08-06 09:43:23,558 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[124, 99, 384], will continue. |
|
2024-08-06 09:43:23,560 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[124, 99, 128], will continue. |
|
2024-08-06 09:43:23,570 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[248, 99, 256], will continue. |
|
2024-08-06 09:43:23,572 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[248, 99, 256], will continue. |
|
2024-08-06 09:43:23,576 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[248, 99, 192], will continue. |
|
2024-08-06 09:43:23,578 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[248, 99, 128], will continue. |
|
2024-08-06 09:43:23,582 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[496, 99, 192], will continue. |
|
2024-08-06 09:43:23,586 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[496, 99, 192], will continue. |
|
2024-08-06 09:43:23,591 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[496, 99, 192], will continue. |
|
2024-08-06 09:43:23,592 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[496, 99, 192], will continue. |
|
2024-08-06 09:43:23,595 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[496, 99, 144], will continue. |
|
2024-08-06 09:43:23,596 INFO [scaling_bf16.py:1037] (2/4) Caught exception in Whiten backward: , size=[496, 99, 192], will continue. |
|
2024-08-06 09:43:23,874 INFO [checkpoint.py:75] (2/4) Saving checkpoint to zipformer/exp_at_full_lr_epochs_15_specaug1_frame192_feature27_musan1_weighted1_md1000_bf16/bad-model-2.pt |
|
2024-08-06 09:43:25,676 INFO [train_bf16.py:1175] (2/4) Saving batch to zipformer/exp_at_full_lr_epochs_15_specaug1_frame192_feature27_musan1_weighted1_md1000_bf16/batch-bdd640fb-0667-1ad1-1c80-317fa3b1799d.pt |
|
2024-08-06 09:43:25,998 INFO [train_bf16.py:1181] (2/4) features shape: torch.Size([99, 1000, 128]) |
|
|