File size: 139,597 Bytes
9530b1f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 |
2023-12-20 17:30:48,670 INFO [train.py:953] (2/4) Training started
2023-12-20 17:30:48,670 INFO [train.py:963] (2/4) Device: cuda:2
2023-12-20 17:30:48,670 INFO [train.py:965] (2/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '2b2ac14b326d61d79d04e53fbd69b1ff6d630411', 'k2-git-date': 'Thu Aug 24 05:58:26 2023', 'lhotse-version': '0.0.0+unknown.version', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'audio_tagging', 'icefall-git-sha1': 'bd01c212-clean', 'icefall-git-date': 'Tue Dec 19 17:20:49 2023', 'icefall-path': '/star-xy/softwares/icefall_development/icefall_audio_tagging', 'k2-path': '/star-xy/softwares/k2_development/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-xy/softwares/lhotse_development/lhotse_at/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-7-1218101249-5bcbfb5567-jsftr', 'IP address': '10.177.6.147'}, 'world_size': 4, 'master_port': 13455, 'tensorboard': True, 'num_epochs': 50, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('zipformer/exp_at_as_full'), 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'num_events': 527, 'audioset_subset': 'full', 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1000, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures'}
2023-12-20 17:30:48,670 INFO [train.py:967] (2/4) About to create model
2023-12-20 17:30:54,289 INFO [train.py:971] (2/4) Number of model parameters: 64264454
2023-12-20 17:30:56,960 INFO [train.py:986] (2/4) Using DDP
2023-12-20 17:30:57,435 INFO [at_datamodule.py:398] (2/4) About to get the audioset cuts for KD.
2023-12-20 17:30:57,498 INFO [at_datamodule.py:223] (2/4) Enable MUSAN
2023-12-20 17:30:57,498 INFO [at_datamodule.py:224] (2/4) About to get Musan cuts
2023-12-20 17:30:59,983 INFO [at_datamodule.py:248] (2/4) Enable SpecAugment
2023-12-20 17:30:59,983 INFO [at_datamodule.py:249] (2/4) Time warp factor: 80
2023-12-20 17:30:59,984 INFO [at_datamodule.py:259] (2/4) Num frame mask: 10
2023-12-20 17:30:59,984 INFO [at_datamodule.py:272] (2/4) About to create train dataset
2023-12-20 17:30:59,984 INFO [at_datamodule.py:299] (2/4) Using DynamicBucketingSampler.
2023-12-20 17:31:02,097 INFO [at_datamodule.py:315] (2/4) About to create train dataloader
2023-12-20 17:31:02,098 INFO [at_datamodule.py:410] (2/4) About to get test-other cuts
2023-12-20 17:31:02,100 INFO [at_datamodule.py:346] (2/4) About to create dev dataset
2023-12-20 17:31:02,576 INFO [at_datamodule.py:363] (2/4) About to create dev dataloader
2023-12-20 17:31:25,020 INFO [train.py:886] (2/4) Epoch 1, batch 0, loss[loss=2.283, audio_tagging_loss=2.283, over 20581.00 frames. ], tot_loss[loss=2.283, audio_tagging_loss=2.283, over 20581.00 frames. ], batch size: 106, lr: 2.25e-02, grad_scale: 2.0
2023-12-20 17:31:25,021 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:31:46,185 INFO [train.py:917] (2/4) Epoch 1, validation: loss=1.716, audio_tagging_loss=1.716, over 3737520.00 frames.
2023-12-20 17:31:46,186 INFO [train.py:918] (2/4) Maximum memory allocated so far is 13081MB
2023-12-20 17:31:48,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=0.0, ans=0.5
2023-12-20 17:31:50,482 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=0.0, ans=0.3
2023-12-20 17:31:53,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=0.0, ans=0.9
2023-12-20 17:31:54,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=0.0, ans=0.5
2023-12-20 17:31:56,787 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.044e+02 8.568e+02 1.002e+03 1.369e+03 1.715e+03, threshold=4.006e+03, percent-clipped=0.0
2023-12-20 17:31:58,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=66.66666666666667, ans=0.8976666666666667
2023-12-20 17:31:59,471 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=148.42 vs. limit=7.525
2023-12-20 17:32:01,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=66.66666666666667, ans=0.496875
2023-12-20 17:32:07,424 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.268e+01 3.256e+02 7.044e+02 1.161e+03 1.783e+03, threshold=2.818e+03, percent-clipped=0.0
2023-12-20 17:32:09,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=133.33333333333334, ans=0.7513333333333333
2023-12-20 17:32:13,323 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=60.35 vs. limit=4.053333333333334
2023-12-20 17:32:27,232 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=60.54 vs. limit=7.575
2023-12-20 17:32:30,516 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=222.77 vs. limit=5.133333333333334
2023-12-20 17:32:30,896 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.273e+01 1.290e+02 2.793e+02 8.337e+02 1.783e+03, threshold=1.117e+03, percent-clipped=0.0
2023-12-20 17:32:32,702 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=17.53 vs. limit=7.6
2023-12-20 17:32:33,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=266.6666666666667, ans=0.4875
2023-12-20 17:32:34,708 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=313.04 vs. limit=7.6
2023-12-20 17:32:37,228 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=21.24 vs. limit=4.1066666666666665
2023-12-20 17:32:38,469 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=110.90 vs. limit=7.7
2023-12-20 17:32:39,231 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=111.36 vs. limit=4.1066666666666665
2023-12-20 17:32:40,041 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00
2023-12-20 17:32:41,292 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=255.36 vs. limit=7.75
2023-12-20 17:32:41,374 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=366.89 vs. limit=7.625
2023-12-20 17:32:42,072 INFO [train.py:886] (2/4) Epoch 1, batch 50, loss[loss=0.06074, audio_tagging_loss=0.06074, over 25000.00 frames. ], tot_loss[loss=0.3051, audio_tagging_loss=0.3051, over 1114689.49 frames. ], batch size: 100, lr: 2.48e-02, grad_scale: 2.0
2023-12-20 17:33:00,438 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=54.11 vs. limit=7.63
2023-12-20 17:33:07,720 INFO [train.py:886] (2/4) Epoch 2, batch 0, loss[loss=0.06753, audio_tagging_loss=0.06753, over 21552.00 frames. ], tot_loss[loss=0.06753, audio_tagging_loss=0.06753, over 21552.00 frames. ], batch size: 106, lr: 2.44e-02, grad_scale: 4.0
2023-12-20 17:33:07,721 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:33:15,956 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.0380, 5.2993, 4.8652, 5.2336], device='cuda:2')
2023-12-20 17:33:23,169 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.8107, 4.8136, 4.8127, 4.8180], device='cuda:2')
2023-12-20 17:33:28,174 INFO [train.py:917] (2/4) Epoch 2, validation: loss=0.0597, audio_tagging_loss=0.0597, over 3737520.00 frames.
2023-12-20 17:33:28,175 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14643MB
2023-12-20 17:33:32,027 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=213.93 vs. limit=5.173333333333334
2023-12-20 17:33:36,055 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=496.21 vs. limit=7.63
2023-12-20 17:33:40,925 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=78.05 vs. limit=7.655
2023-12-20 17:33:41,696 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=332.07 vs. limit=7.81
2023-12-20 17:33:42,886 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=413.56 vs. limit=7.655
2023-12-20 17:33:47,594 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=40.32 vs. limit=7.655
2023-12-20 17:33:50,301 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=230.04 vs. limit=7.655
2023-12-20 17:33:51,571 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=263.30 vs. limit=7.68
2023-12-20 17:33:51,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=480.0, ans=7.68
2023-12-20 17:33:52,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=480.0, ans=0.4775
2023-12-20 17:34:03,460 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.20 vs. limit=3.082
2023-12-20 17:34:06,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=546.6666666666666, ans=0.17950000000000002
2023-12-20 17:34:06,711 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=200.49 vs. limit=5.273333333333333
2023-12-20 17:34:14,753 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=15.66 vs. limit=4.245333333333333
2023-12-20 17:34:19,210 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=15.54 vs. limit=4.245333333333333
2023-12-20 17:34:21,478 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=20.41 vs. limit=5.153333333333333
2023-12-20 17:34:25,459 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.968e+01 6.154e+01 2.791e+02 2.019e+03, threshold=1.231e+02, percent-clipped=1.0
2023-12-20 17:34:26,577 INFO [train.py:886] (2/4) Epoch 2, batch 50, loss[loss=0.05319, audio_tagging_loss=0.05319, over 25000.00 frames. ], tot_loss[loss=0.05741, audio_tagging_loss=0.05741, over 1123870.55 frames. ], batch size: 100, lr: 2.66e-02, grad_scale: 2.0
2023-12-20 17:34:44,472 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=301.26 vs. limit=7.76
2023-12-20 17:34:52,047 INFO [train.py:886] (2/4) Epoch 3, batch 0, loss[loss=0.06629, audio_tagging_loss=0.06629, over 20834.00 frames. ], tot_loss[loss=0.06629, audio_tagging_loss=0.06629, over 20834.00 frames. ], batch size: 106, lr: 2.54e-02, grad_scale: 4.0
2023-12-20 17:34:52,048 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:35:12,448 INFO [train.py:917] (2/4) Epoch 3, validation: loss=0.05878, audio_tagging_loss=0.05878, over 3737520.00 frames.
2023-12-20 17:35:12,448 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14778MB
2023-12-20 17:35:13,766 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=693.3333333333334, ans=0.4675
2023-12-20 17:35:22,421 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=13.84 vs. limit=4.277333333333333
2023-12-20 17:35:29,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=760.0, ans=5.475
2023-12-20 17:35:31,698 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=146.93 vs. limit=8.07
2023-12-20 17:35:33,492 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=760.0, ans=0.1715
2023-12-20 17:35:41,630 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=182.75 vs. limit=7.81
2023-12-20 17:35:45,467 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=238.44 vs. limit=7.81
2023-12-20 17:35:48,261 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=22.94 vs. limit=7.835
2023-12-20 17:35:51,122 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=893.3333333333334, ans=0.458125
2023-12-20 17:35:55,313 WARNING [optim.py:500] (2/4) Scaling gradients by 0.09217905253171921, model_norm_threshold=123.07855224609375
2023-12-20 17:35:55,463 WARNING [optim.py:572] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.7.weight with proportion 0.48, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.614e+05, grad_sumsq=6.752e+08, orig_rms_sq=1.276e-03
2023-12-20 17:35:55,891 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=348.28 vs. limit=7.835
2023-12-20 17:35:56,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=893.3333333333334, ans=0.458125
2023-12-20 17:36:06,020 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=320.93 vs. limit=7.86
2023-12-20 17:36:09,438 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=159.67 vs. limit=5.48
2023-12-20 17:36:11,142 INFO [train.py:886] (2/4) Epoch 3, batch 50, loss[loss=0.0548, audio_tagging_loss=0.0548, over 25000.00 frames. ], tot_loss[loss=0.05632, audio_tagging_loss=0.05632, over 1116987.31 frames. ], batch size: 100, lr: 2.75e-02, grad_scale: 4.0
2023-12-20 17:36:11,464 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=200.40 vs. limit=5.513333333333334
2023-12-20 17:36:35,802 INFO [train.py:886] (2/4) Epoch 4, batch 0, loss[loss=0.05267, audio_tagging_loss=0.05267, over 25000.00 frames. ], tot_loss[loss=0.05267, audio_tagging_loss=0.05267, over 25000.00 frames. ], batch size: 100, lr: 2.58e-02, grad_scale: 8.0
2023-12-20 17:36:35,803 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:36:54,828 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.2928, 5.0731, 4.1999, 4.6702], device='cuda:2')
2023-12-20 17:36:55,849 INFO [train.py:917] (2/4) Epoch 4, validation: loss=0.05673, audio_tagging_loss=0.05673, over 3737520.00 frames.
2023-12-20 17:36:55,850 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14778MB
2023-12-20 17:37:05,033 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.17 vs. limit=8.28
2023-12-20 17:37:11,618 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=134.90 vs. limit=5.553333333333334
2023-12-20 17:37:11,986 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=103.01 vs. limit=7.915
2023-12-20 17:37:15,307 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.41 vs. limit=4.442666666666667
2023-12-20 17:37:17,163 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.47 vs. limit=4.442666666666667
2023-12-20 17:37:18,120 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=152.91 vs. limit=5.553333333333334
2023-12-20 17:37:19,163 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=105.25 vs. limit=8.33
2023-12-20 17:37:22,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1173.3333333333333, ans=0.8589333333333333
2023-12-20 17:37:22,214 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1173.3333333333333, ans=0.156
2023-12-20 17:37:22,415 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=35.86 vs. limit=5.293333333333333
2023-12-20 17:37:24,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1173.3333333333333, ans=0.8589333333333333
2023-12-20 17:37:27,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1173.3333333333333, ans=0.445
2023-12-20 17:37:30,098 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=325.36 vs. limit=8.38
2023-12-20 17:37:31,186 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=238.31 vs. limit=8.43
2023-12-20 17:37:33,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1240.0, ans=0.28759999999999997
2023-12-20 17:37:33,470 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=114.42 vs. limit=7.965
2023-12-20 17:37:34,896 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=332.68 vs. limit=7.965
2023-12-20 17:37:38,260 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=162.86 vs. limit=7.965
2023-12-20 17:37:40,255 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=131.95 vs. limit=7.965
2023-12-20 17:37:42,771 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=64.16 vs. limit=5.653333333333333
2023-12-20 17:37:49,920 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.093e+01 2.504e+01 2.720e+01 3.182e+01 1.335e+03, threshold=5.440e+01, percent-clipped=1.0
2023-12-20 17:37:54,275 INFO [train.py:886] (2/4) Epoch 4, batch 50, loss[loss=0.05111, audio_tagging_loss=0.05111, over 25000.00 frames. ], tot_loss[loss=0.05369, audio_tagging_loss=0.05369, over 1121668.45 frames. ], batch size: 100, lr: 2.77e-02, grad_scale: 4.0
2023-12-20 17:38:12,172 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=295.48 vs. limit=8.02
2023-12-20 17:38:19,535 INFO [train.py:886] (2/4) Epoch 5, batch 0, loss[loss=0.06715, audio_tagging_loss=0.06715, over 20425.00 frames. ], tot_loss[loss=0.06715, audio_tagging_loss=0.06715, over 20425.00 frames. ], batch size: 106, lr: 2.59e-02, grad_scale: 8.0
2023-12-20 17:38:19,536 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:38:39,889 INFO [train.py:917] (2/4) Epoch 5, validation: loss=0.05523, audio_tagging_loss=0.05523, over 3737520.00 frames.
2023-12-20 17:38:39,890 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14778MB
2023-12-20 17:38:44,050 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.36 vs. limit=4.554666666666667
2023-12-20 17:38:47,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1386.6666666666667, ans=0.0688
2023-12-20 17:38:54,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1453.3333333333333, ans=0.431875
2023-12-20 17:39:00,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1453.3333333333333, ans=0.28546666666666665
2023-12-20 17:39:05,393 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=96.82 vs. limit=8.64
2023-12-20 17:39:10,046 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=1520.0, ans=5.76
2023-12-20 17:39:14,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1520.0, ans=0.2228
2023-12-20 17:39:16,626 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=49.08 vs. limit=8.095
2023-12-20 17:39:21,446 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=458.13 vs. limit=8.095
2023-12-20 17:39:38,872 INFO [train.py:886] (2/4) Epoch 5, batch 50, loss[loss=0.05064, audio_tagging_loss=0.05064, over 25000.00 frames. ], tot_loss[loss=0.05248, audio_tagging_loss=0.05248, over 1117817.77 frames. ], batch size: 100, lr: 2.77e-02, grad_scale: 8.0
2023-12-20 17:39:39,258 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=78.15 vs. limit=5.86
2023-12-20 17:40:04,929 INFO [train.py:886] (2/4) Epoch 6, batch 0, loss[loss=0.04925, audio_tagging_loss=0.04925, over 25000.00 frames. ], tot_loss[loss=0.04925, audio_tagging_loss=0.04925, over 25000.00 frames. ], batch size: 100, lr: 2.59e-02, grad_scale: 16.0
2023-12-20 17:40:04,930 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:40:19,729 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.0755, 3.7526, 5.2326, 3.6878], device='cuda:2')
2023-12-20 17:40:25,816 INFO [train.py:917] (2/4) Epoch 6, validation: loss=0.05425, audio_tagging_loss=0.05425, over 3737520.00 frames.
2023-12-20 17:40:25,817 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 17:40:26,280 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=81.13 vs. limit=8.8
2023-12-20 17:40:28,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1733.3333333333333, ans=0.08916666666666667
2023-12-20 17:40:29,609 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=22.31 vs. limit=5.433333333333334
2023-12-20 17:40:36,941 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=14.60 vs. limit=4.72
2023-12-20 17:40:37,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1800.0, ans=0.837
2023-12-20 17:40:52,541 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1866.6666666666667, ans=0.4125
2023-12-20 17:40:52,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1866.6666666666667, ans=0.8346666666666667
2023-12-20 17:40:53,040 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=319.38 vs. limit=8.2
2023-12-20 17:40:58,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1866.6666666666667, ans=0.2813333333333333
2023-12-20 17:41:03,045 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=122.10 vs. limit=8.95
2023-12-20 17:41:07,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1933.3333333333333, ans=0.409375
2023-12-20 17:41:07,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1933.3333333333333, ans=0.409375
2023-12-20 17:41:10,638 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1933.3333333333333, ans=0.05650000000000001
2023-12-20 17:41:10,726 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=17.73 vs. limit=5.483333333333333
2023-12-20 17:41:13,404 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.98 vs. limit=3.3
2023-12-20 17:41:14,959 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 2.556e+01 2.831e+01 3.472e+01 7.747e+01, threshold=5.662e+01, percent-clipped=6.0
2023-12-20 17:41:18,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2000.0, ans=0.8300000000000001
2023-12-20 17:41:20,897 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=64.62 vs. limit=9.0
2023-12-20 17:41:23,850 INFO [train.py:886] (2/4) Epoch 6, batch 50, loss[loss=0.04601, audio_tagging_loss=0.04601, over 25000.00 frames. ], tot_loss[loss=0.0512, audio_tagging_loss=0.0512, over 1124299.94 frames. ], batch size: 100, lr: 2.76e-02, grad_scale: 16.0
2023-12-20 17:41:49,202 INFO [train.py:886] (2/4) Epoch 7, batch 0, loss[loss=0.05184, audio_tagging_loss=0.05184, over 24101.00 frames. ], tot_loss[loss=0.05184, audio_tagging_loss=0.05184, over 24101.00 frames. ], batch size: 100, lr: 2.60e-02, grad_scale: 32.0
2023-12-20 17:41:49,202 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:42:09,822 INFO [train.py:917] (2/4) Epoch 7, validation: loss=0.05269, audio_tagging_loss=0.05269, over 3737520.00 frames.
2023-12-20 17:42:09,823 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 17:42:11,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2080.0, ans=0.40249999999999997
2023-12-20 17:42:13,768 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=39.36 vs. limit=8.28
2023-12-20 17:42:13,877 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=286.41 vs. limit=8.28
2023-12-20 17:42:21,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2146.6666666666665, ans=0.399375
2023-12-20 17:42:26,575 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.94 vs. limit=8.305
2023-12-20 17:42:30,575 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=9.86 vs. limit=4.429333333333333
2023-12-20 17:42:35,984 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=25.24 vs. limit=9.16
2023-12-20 17:42:37,806 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2213.3333333333335, ans=0.39625
2023-12-20 17:43:07,588 INFO [train.py:886] (2/4) Epoch 7, batch 50, loss[loss=0.04344, audio_tagging_loss=0.04344, over 25000.00 frames. ], tot_loss[loss=0.05087, audio_tagging_loss=0.05087, over 1122403.67 frames. ], batch size: 100, lr: 2.76e-02, grad_scale: 1.0
2023-12-20 17:43:08,096 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=33.27 vs. limit=8.405
2023-12-20 17:43:32,848 INFO [train.py:886] (2/4) Epoch 8, batch 0, loss[loss=0.05077, audio_tagging_loss=0.05077, over 24170.00 frames. ], tot_loss[loss=0.05077, audio_tagging_loss=0.05077, over 24170.00 frames. ], batch size: 100, lr: 2.60e-02, grad_scale: 2.0
2023-12-20 17:43:32,848 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:43:47,796 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.8993, 4.3654, 3.4870, 3.3905], device='cuda:2')
2023-12-20 17:43:53,651 INFO [train.py:917] (2/4) Epoch 8, validation: loss=0.05155, audio_tagging_loss=0.05155, over 3737520.00 frames.
2023-12-20 17:43:53,652 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 17:44:23,899 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=40.97 vs. limit=9.42
2023-12-20 17:44:23,966 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.71 vs. limit=5.024
2023-12-20 17:44:37,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2626.6666666666665, ans=0.5
2023-12-20 17:44:42,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2693.3333333333335, ans=0.09899999999999999
2023-12-20 17:44:43,338 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.635e+01 3.487e+01 4.265e+01 5.657e+01 4.687e+02, threshold=8.530e+01, percent-clipped=24.0
2023-12-20 17:44:43,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2693.3333333333335, ans=0.04158333333333333
2023-12-20 17:44:48,472 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=26.61 vs. limit=8.51
2023-12-20 17:44:50,241 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2760.0, ans=0.8034
2023-12-20 17:44:51,056 INFO [train.py:886] (2/4) Epoch 8, batch 50, loss[loss=0.04868, audio_tagging_loss=0.04868, over 25000.00 frames. ], tot_loss[loss=0.04903, audio_tagging_loss=0.04903, over 1126572.41 frames. ], batch size: 100, lr: 2.75e-02, grad_scale: 2.0
2023-12-20 17:45:09,995 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.07 vs. limit=6.386666666666667
2023-12-20 17:45:16,350 INFO [train.py:886] (2/4) Epoch 9, batch 0, loss[loss=0.0511, audio_tagging_loss=0.0511, over 24103.00 frames. ], tot_loss[loss=0.0511, audio_tagging_loss=0.0511, over 24103.00 frames. ], batch size: 100, lr: 2.61e-02, grad_scale: 4.0
2023-12-20 17:45:16,350 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:45:37,427 INFO [train.py:917] (2/4) Epoch 9, validation: loss=0.04977, audio_tagging_loss=0.04977, over 3737520.00 frames.
2023-12-20 17:45:37,428 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 17:45:37,767 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=92.44 vs. limit=9.58
2023-12-20 17:45:38,899 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=35.85 vs. limit=8.54
2023-12-20 17:45:52,080 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=167.20 vs. limit=8.565
2023-12-20 17:45:55,238 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=25.80 vs. limit=8.565
2023-12-20 17:46:04,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2906.6666666666665, ans=0.36375
2023-12-20 17:46:06,137 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=135.02 vs. limit=8.59
2023-12-20 17:46:10,579 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=93.55 vs. limit=8.615
2023-12-20 17:46:14,905 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=18.12 vs. limit=8.615
2023-12-20 17:46:15,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2973.3333333333335, ans=0.360625
2023-12-20 17:46:19,650 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=31.90 vs. limit=9.73
2023-12-20 17:46:23,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3040.0, ans=5.76
2023-12-20 17:46:24,027 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=121.48 vs. limit=8.64
2023-12-20 17:46:30,447 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=25.05 vs. limit=6.52
2023-12-20 17:46:32,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3106.6666666666665, ans=0.7912666666666667
2023-12-20 17:46:33,270 INFO [train.py:886] (2/4) Epoch 9, batch 50, loss[loss=0.04414, audio_tagging_loss=0.04414, over 25000.00 frames. ], tot_loss[loss=0.04714, audio_tagging_loss=0.04714, over 1123255.65 frames. ], batch size: 100, lr: 2.75e-02, grad_scale: 4.0
2023-12-20 17:46:33,779 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=32.84 vs. limit=8.665
2023-12-20 17:46:52,658 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.87 vs. limit=5.78
2023-12-20 17:46:59,483 INFO [train.py:886] (2/4) Epoch 10, batch 0, loss[loss=0.04603, audio_tagging_loss=0.04603, over 24103.00 frames. ], tot_loss[loss=0.04603, audio_tagging_loss=0.04603, over 24103.00 frames. ], batch size: 100, lr: 2.62e-02, grad_scale: 8.0
2023-12-20 17:46:59,484 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:47:20,692 INFO [train.py:917] (2/4) Epoch 10, validation: loss=0.04858, audio_tagging_loss=0.04858, over 3737520.00 frames.
2023-12-20 17:47:20,693 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 17:47:21,188 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=25.96 vs. limit=8.67
2023-12-20 17:47:21,339 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.49 vs. limit=5.248
2023-12-20 17:47:31,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=3186.6666666666665, ans=0.07966666666666668
2023-12-20 17:47:37,859 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3186.6666666666665, ans=0.35062499999999996
2023-12-20 17:47:45,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3253.3333333333335, ans=0.34750000000000003
2023-12-20 17:47:48,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3253.3333333333335, ans=0.07799999999999999
2023-12-20 17:47:49,448 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=31.91 vs. limit=8.72
2023-12-20 17:47:58,344 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.51 vs. limit=9.99
2023-12-20 17:48:01,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3320.0, ans=0.0755
2023-12-20 17:48:03,741 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.44 vs. limit=6.66
2023-12-20 17:48:04,134 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.619e+01 3.726e+01 4.484e+01 5.424e+01 1.858e+02, threshold=8.969e+01, percent-clipped=3.0
2023-12-20 17:48:15,997 INFO [train.py:886] (2/4) Epoch 10, batch 50, loss[loss=0.04358, audio_tagging_loss=0.04358, over 25000.00 frames. ], tot_loss[loss=0.0462, audio_tagging_loss=0.0462, over 1119906.38 frames. ], batch size: 100, lr: 2.71e-02, grad_scale: 8.0
2023-12-20 17:48:40,825 INFO [train.py:886] (2/4) Epoch 11, batch 0, loss[loss=0.04723, audio_tagging_loss=0.04723, over 24078.00 frames. ], tot_loss[loss=0.04723, audio_tagging_loss=0.04723, over 24078.00 frames. ], batch size: 100, lr: 2.58e-02, grad_scale: 16.0
2023-12-20 17:48:40,826 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:48:53,636 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.3948, 2.3390, 3.4762, 2.4205], device='cuda:2')
2023-12-20 17:49:01,994 INFO [train.py:917] (2/4) Epoch 11, validation: loss=0.04728, audio_tagging_loss=0.04728, over 3737520.00 frames.
2023-12-20 17:49:01,995 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 17:49:05,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3466.6666666666665, ans=0.7846666666666666
2023-12-20 17:49:10,845 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.75 vs. limit=10.1
2023-12-20 17:49:13,799 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.27 vs. limit=5.413333333333333
2023-12-20 17:49:22,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3533.3333333333335, ans=0.334375
2023-12-20 17:49:24,148 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.46 vs. limit=6.766666666666667
2023-12-20 17:49:28,421 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.80 vs. limit=6.8
2023-12-20 17:49:31,381 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3600.0, ans=0.33125
2023-12-20 17:49:32,726 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.87 vs. limit=8.85
2023-12-20 17:49:33,807 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=43.37 vs. limit=8.85
2023-12-20 17:49:34,197 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.32 vs. limit=10.2
2023-12-20 17:49:34,814 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=33.69 vs. limit=10.2
2023-12-20 17:49:40,594 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.99 vs. limit=10.25
2023-12-20 17:49:42,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3666.6666666666665, ans=0.06249999999999997
2023-12-20 17:49:43,617 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=62.52 vs. limit=8.875
2023-12-20 17:49:46,822 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.27 vs. limit=5.493333333333333
2023-12-20 17:49:48,262 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=31.55 vs. limit=8.9
2023-12-20 17:49:49,453 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.65 vs. limit=8.9
2023-12-20 17:49:53,288 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=62.67 vs. limit=8.9
2023-12-20 17:49:56,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3733.3333333333335, ans=0.05999999999999997
2023-12-20 17:49:57,433 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=4.452e+00
2023-12-20 17:49:57,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3800.0, ans=0.014499999999999985
2023-12-20 17:49:57,531 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=42.27 vs. limit=8.925
2023-12-20 17:49:58,312 INFO [train.py:886] (2/4) Epoch 11, batch 50, loss[loss=0.04255, audio_tagging_loss=0.04255, over 25000.00 frames. ], tot_loss[loss=0.04557, audio_tagging_loss=0.04557, over 1117498.70 frames. ], batch size: 100, lr: 2.58e-02, grad_scale: 16.0
2023-12-20 17:49:58,780 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.69 vs. limit=8.925
2023-12-20 17:50:23,186 INFO [train.py:886] (2/4) Epoch 12, batch 0, loss[loss=0.0448, audio_tagging_loss=0.0448, over 24138.00 frames. ], tot_loss[loss=0.0448, audio_tagging_loss=0.0448, over 24138.00 frames. ], batch size: 100, lr: 2.47e-02, grad_scale: 32.0
2023-12-20 17:50:23,187 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:50:36,041 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.9587, 1.7990, 1.8133, 1.8953], device='cuda:2')
2023-12-20 17:50:44,477 INFO [train.py:917] (2/4) Epoch 12, validation: loss=0.04619, audio_tagging_loss=0.04619, over 3737520.00 frames.
2023-12-20 17:50:44,478 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 17:50:51,736 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.73 vs. limit=5.953333333333333
2023-12-20 17:50:51,760 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.91 vs. limit=8.93
2023-12-20 17:50:57,718 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=71.88 vs. limit=8.955
2023-12-20 17:50:57,757 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.99 vs. limit=6.9399999999999995
2023-12-20 17:51:00,137 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.65 vs. limit=8.955
2023-12-20 17:51:06,694 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=20.92 vs. limit=8.955
2023-12-20 17:51:07,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3946.6666666666665, ans=0.2592
2023-12-20 17:51:10,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3946.6666666666665, ans=0.05199999999999999
2023-12-20 17:51:11,498 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.21 vs. limit=10.46
2023-12-20 17:51:24,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4013.3333333333335, ans=0.311875
2023-12-20 17:51:25,063 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.695e+01 3.849e+01 4.841e+01 5.572e+01 8.770e+01, threshold=9.682e+01, percent-clipped=0.0
2023-12-20 17:51:26,808 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=27.57 vs. limit=9.004999999999999
2023-12-20 17:51:27,436 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4013.3333333333335, ans=0.311875
2023-12-20 17:51:33,074 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.21 vs. limit=7.04
2023-12-20 17:51:35,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=4080.0, ans=9.03
2023-12-20 17:51:40,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4146.666666666667, ans=0.04938888888888889
2023-12-20 17:51:40,951 INFO [train.py:886] (2/4) Epoch 12, batch 50, loss[loss=0.04178, audio_tagging_loss=0.04178, over 25000.00 frames. ], tot_loss[loss=0.04376, audio_tagging_loss=0.04376, over 1120004.19 frames. ], batch size: 100, lr: 2.47e-02, grad_scale: 32.0
2023-12-20 17:52:04,705 INFO [train.py:886] (2/4) Epoch 13, batch 0, loss[loss=0.03956, audio_tagging_loss=0.03956, over 25000.00 frames. ], tot_loss[loss=0.03956, audio_tagging_loss=0.03956, over 25000.00 frames. ], batch size: 100, lr: 2.38e-02, grad_scale: 32.0
2023-12-20 17:52:04,705 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:52:12,866 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([1.9063, 1.1653, 1.8288, 1.8131], device='cuda:2')
2023-12-20 17:52:25,607 INFO [train.py:917] (2/4) Epoch 13, validation: loss=0.04525, audio_tagging_loss=0.04525, over 3737520.00 frames.
2023-12-20 17:52:25,608 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 17:52:25,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4160.0, ans=0.2584
2023-12-20 17:52:25,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4160.0, ans=0.07400000000000001
2023-12-20 17:52:28,476 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=17.39 vs. limit=9.06
2023-12-20 17:52:29,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4160.0, ans=0.0
2023-12-20 17:52:29,163 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=24.32 vs. limit=10.620000000000001
2023-12-20 17:52:36,978 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4226.666666666667, ans=0.009950724637681159
2023-12-20 17:52:39,203 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=29.03 vs. limit=10.67
2023-12-20 17:52:41,561 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.78 vs. limit=10.67
2023-12-20 17:52:42,810 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=33.46 vs. limit=9.085
2023-12-20 17:52:42,961 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=5.62 vs. limit=5.0
2023-12-20 17:52:49,800 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4293.333333333333, ans=0.07
2023-12-20 17:52:50,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4293.333333333333, ans=0.04949747468305833
2023-12-20 17:52:59,815 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.58 vs. limit=9.135
2023-12-20 17:53:04,180 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.15 vs. limit=9.135
2023-12-20 17:53:11,233 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=51.27 vs. limit=9.16
2023-12-20 17:53:15,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4426.666666666667, ans=0.009907246376811594
2023-12-20 17:53:19,091 INFO [train.py:886] (2/4) Epoch 13, batch 50, loss[loss=0.04299, audio_tagging_loss=0.04299, over 25000.00 frames. ], tot_loss[loss=0.04317, audio_tagging_loss=0.04317, over 1121045.90 frames. ], batch size: 100, lr: 2.38e-02, grad_scale: 32.0
2023-12-20 17:53:43,854 INFO [train.py:886] (2/4) Epoch 14, batch 0, loss[loss=0.04288, audio_tagging_loss=0.04288, over 25000.00 frames. ], tot_loss[loss=0.04288, audio_tagging_loss=0.04288, over 25000.00 frames. ], batch size: 100, lr: 2.29e-02, grad_scale: 32.0
2023-12-20 17:53:43,854 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:53:55,626 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.9270, 1.7806, 2.1339, 1.9149], device='cuda:2')
2023-12-20 17:54:05,165 INFO [train.py:917] (2/4) Epoch 14, validation: loss=0.04503, audio_tagging_loss=0.04503, over 3737520.00 frames.
2023-12-20 17:54:05,166 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 17:54:11,065 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.61 vs. limit=10.879999999999999
2023-12-20 17:54:26,446 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.15 vs. limit=9.24
2023-12-20 17:54:29,279 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.038e+01
2023-12-20 17:54:31,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4640.0, ans=0.2825
2023-12-20 17:54:31,642 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.40 vs. limit=9.24
2023-12-20 17:54:32,612 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.14 vs. limit=7.32
2023-12-20 17:54:35,687 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=24.44 vs. limit=10.98
2023-12-20 17:54:37,652 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4706.666666666667, ans=0.27937500000000004
2023-12-20 17:54:38,362 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.429e+01 4.195e+01 5.214e+01 6.348e+01 1.962e+02, threshold=1.043e+02, percent-clipped=5.0
2023-12-20 17:54:40,051 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.22 vs. limit=9.265
2023-12-20 17:54:57,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4840.0, ans=0.273125
2023-12-20 17:54:58,020 INFO [train.py:886] (2/4) Epoch 14, batch 50, loss[loss=0.03772, audio_tagging_loss=0.03772, over 25000.00 frames. ], tot_loss[loss=0.04195, audio_tagging_loss=0.04195, over 1123574.72 frames. ], batch size: 100, lr: 2.29e-02, grad_scale: 32.0
2023-12-20 17:55:22,498 INFO [train.py:886] (2/4) Epoch 15, batch 0, loss[loss=0.04001, audio_tagging_loss=0.04001, over 25000.00 frames. ], tot_loss[loss=0.04001, audio_tagging_loss=0.04001, over 25000.00 frames. ], batch size: 100, lr: 2.21e-02, grad_scale: 32.0
2023-12-20 17:55:22,498 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:55:43,372 INFO [train.py:917] (2/4) Epoch 15, validation: loss=0.04452, audio_tagging_loss=0.04452, over 3737520.00 frames.
2023-12-20 17:55:43,373 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 17:55:44,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=4853.333333333333, ans=0.20146666666666668
2023-12-20 17:55:44,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=4853.333333333333, ans=9.32
2023-12-20 17:55:47,837 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=36.73 vs. limit=9.32
2023-12-20 17:55:49,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4853.333333333333, ans=8.033333333333333
2023-12-20 17:55:49,892 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=27.66 vs. limit=9.32
2023-12-20 17:55:58,425 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.65 vs. limit=11.19
2023-12-20 17:56:01,385 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.60 vs. limit=11.19
2023-12-20 17:56:16,480 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=27.61 vs. limit=9.395
2023-12-20 17:56:18,276 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.38 vs. limit=9.395
2023-12-20 17:56:23,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=5053.333333333333, ans=0.045611111111111116
2023-12-20 17:56:35,386 INFO [train.py:886] (2/4) Epoch 15, batch 50, loss[loss=0.04041, audio_tagging_loss=0.04041, over 25000.00 frames. ], tot_loss[loss=0.04165, audio_tagging_loss=0.04165, over 1115568.63 frames. ], batch size: 100, lr: 2.21e-02, grad_scale: 32.0
2023-12-20 17:57:00,247 INFO [train.py:886] (2/4) Epoch 16, batch 0, loss[loss=0.0395, audio_tagging_loss=0.0395, over 25000.00 frames. ], tot_loss[loss=0.0395, audio_tagging_loss=0.0395, over 25000.00 frames. ], batch size: 100, lr: 2.14e-02, grad_scale: 32.0
2023-12-20 17:57:00,248 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:57:12,226 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.6515, 2.9810, 2.7109, 3.1295], device='cuda:2')
2023-12-20 17:57:21,257 INFO [train.py:917] (2/4) Epoch 16, validation: loss=0.04383, audio_tagging_loss=0.04383, over 3737520.00 frames.
2023-12-20 17:57:21,257 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 17:57:22,530 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.040e+02
2023-12-20 17:57:26,145 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5200.0, ans=0.25625
2023-12-20 17:57:27,290 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=30.31 vs. limit=9.45
2023-12-20 17:57:27,437 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.57 vs. limit=6.08
2023-12-20 17:57:31,593 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.12 vs. limit=9.475
2023-12-20 17:57:38,836 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.70 vs. limit=9.475
2023-12-20 17:57:44,198 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=9.5
2023-12-20 17:57:49,876 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.797e+01 3.933e+01 4.813e+01 5.766e+01 2.623e+02, threshold=9.626e+01, percent-clipped=4.0
2023-12-20 17:57:56,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=5400.0, ans=0.7110000000000001
2023-12-20 17:58:12,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=5466.666666666667, ans=0.24375000000000002
2023-12-20 17:58:13,998 INFO [train.py:886] (2/4) Epoch 16, batch 50, loss[loss=0.04058, audio_tagging_loss=0.04058, over 25000.00 frames. ], tot_loss[loss=0.04029, audio_tagging_loss=0.04029, over 1124080.10 frames. ], batch size: 100, lr: 2.14e-02, grad_scale: 32.0
2023-12-20 17:58:14,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5533.333333333333, ans=0.24062499999999998
2023-12-20 17:58:38,072 INFO [train.py:886] (2/4) Epoch 17, batch 0, loss[loss=0.04294, audio_tagging_loss=0.04294, over 24156.00 frames. ], tot_loss[loss=0.04294, audio_tagging_loss=0.04294, over 24156.00 frames. ], batch size: 100, lr: 2.07e-02, grad_scale: 32.0
2023-12-20 17:58:38,072 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 17:58:46,300 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.6665, 3.0104, 2.7217, 2.8777], device='cuda:2')
2023-12-20 17:58:59,163 INFO [train.py:917] (2/4) Epoch 17, validation: loss=0.04362, audio_tagging_loss=0.04362, over 3737520.00 frames.
2023-12-20 17:58:59,164 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 17:59:00,765 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=22.74 vs. limit=9.58
2023-12-20 17:59:03,698 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.83 vs. limit=9.58
2023-12-20 17:59:12,953 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.71 vs. limit=9.605
2023-12-20 17:59:13,898 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=26.48 vs. limit=9.605
2023-12-20 17:59:16,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=5613.333333333333, ans=0.07
2023-12-20 17:59:17,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=5613.333333333333, ans=0.7035333333333333
2023-12-20 17:59:18,813 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.46 vs. limit=6.272
2023-12-20 17:59:32,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=5746.666666666667, ans=0.042722222222222224
2023-12-20 17:59:37,551 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=26.72 vs. limit=11.809999999999999
2023-12-20 17:59:38,760 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=11.809999999999999
2023-12-20 17:59:43,436 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=18.49 vs. limit=9.68
2023-12-20 17:59:47,521 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=6.48 vs. limit=6.325333333333333
2023-12-20 17:59:49,926 INFO [train.py:886] (2/4) Epoch 17, batch 50, loss[loss=0.03614, audio_tagging_loss=0.03614, over 25000.00 frames. ], tot_loss[loss=0.03988, audio_tagging_loss=0.03988, over 1120588.04 frames. ], batch size: 100, lr: 2.07e-02, grad_scale: 32.0
2023-12-20 18:00:14,311 INFO [train.py:886] (2/4) Epoch 18, batch 0, loss[loss=0.03981, audio_tagging_loss=0.03981, over 24110.00 frames. ], tot_loss[loss=0.03981, audio_tagging_loss=0.03981, over 24110.00 frames. ], batch size: 100, lr: 2.01e-02, grad_scale: 32.0
2023-12-20 18:00:14,312 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:00:35,060 INFO [train.py:917] (2/4) Epoch 18, validation: loss=0.04342, audio_tagging_loss=0.04342, over 3737520.00 frames.
2023-12-20 18:00:35,060 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:00:45,816 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5960.0, ans=0.2404
2023-12-20 18:00:47,260 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.34 vs. limit=9.735
2023-12-20 18:00:58,725 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.069e+01 3.667e+01 4.319e+01 5.687e+01 1.553e+02, threshold=8.639e+01, percent-clipped=3.0
2023-12-20 18:00:59,201 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.41 vs. limit=12.02
2023-12-20 18:01:01,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=6026.666666666667, ans=0.21750000000000003
2023-12-20 18:01:14,386 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.23 vs. limit=12.07
2023-12-20 18:01:18,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=6160.0, ans=0.21125
2023-12-20 18:01:25,357 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.56 vs. limit=9.835
2023-12-20 18:01:25,750 INFO [train.py:886] (2/4) Epoch 18, batch 50, loss[loss=0.03439, audio_tagging_loss=0.03439, over 25000.00 frames. ], tot_loss[loss=0.03833, audio_tagging_loss=0.03833, over 1127202.33 frames. ], batch size: 100, lr: 2.01e-02, grad_scale: 32.0
2023-12-20 18:01:50,821 INFO [train.py:886] (2/4) Epoch 19, batch 0, loss[loss=0.03398, audio_tagging_loss=0.03398, over 25000.00 frames. ], tot_loss[loss=0.03398, audio_tagging_loss=0.03398, over 25000.00 frames. ], batch size: 100, lr: 1.96e-02, grad_scale: 32.0
2023-12-20 18:01:50,821 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:02:11,829 INFO [train.py:917] (2/4) Epoch 19, validation: loss=0.04287, audio_tagging_loss=0.04287, over 3737520.00 frames.
2023-12-20 18:02:11,830 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:02:12,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=6240.0, ans=0.20750000000000002
2023-12-20 18:02:22,842 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.62 vs. limit=12.23
2023-12-20 18:02:28,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=6306.666666666667, ans=0.20437499999999997
2023-12-20 18:02:43,598 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.72 vs. limit=9.915
2023-12-20 18:02:45,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=6440.0, ans=0.198125
2023-12-20 18:02:45,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=6440.0, ans=0.6746000000000001
2023-12-20 18:02:48,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=6440.0, ans=0.029875000000000002
2023-12-20 18:02:52,823 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.61 vs. limit=9.94
2023-12-20 18:02:55,708 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.61 vs. limit=9.94
2023-12-20 18:03:00,160 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=6506.666666666667, ans=0.195
2023-12-20 18:03:00,494 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.95 vs. limit=9.94
2023-12-20 18:03:02,036 INFO [train.py:886] (2/4) Epoch 19, batch 50, loss[loss=0.03557, audio_tagging_loss=0.03557, over 25000.00 frames. ], tot_loss[loss=0.0378, audio_tagging_loss=0.0378, over 1123929.37 frames. ], batch size: 100, lr: 1.96e-02, grad_scale: 32.0
2023-12-20 18:03:26,294 INFO [train.py:886] (2/4) Epoch 20, batch 0, loss[loss=0.03504, audio_tagging_loss=0.03504, over 25000.00 frames. ], tot_loss[loss=0.03504, audio_tagging_loss=0.03504, over 25000.00 frames. ], batch size: 100, lr: 1.91e-02, grad_scale: 32.0
2023-12-20 18:03:26,295 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:03:47,095 INFO [train.py:917] (2/4) Epoch 20, validation: loss=0.0429, audio_tagging_loss=0.0429, over 3737520.00 frames.
2023-12-20 18:03:47,095 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:03:48,714 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.90 vs. limit=9.97
2023-12-20 18:03:58,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=6653.333333333333, ans=0.188125
2023-12-20 18:04:06,504 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.851e+01 3.799e+01 4.551e+01 5.624e+01 1.513e+02, threshold=9.102e+01, percent-clipped=5.0
2023-12-20 18:04:15,815 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.55 vs. limit=12.54
2023-12-20 18:04:28,964 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.81 vs. limit=6.713333333333333
2023-12-20 18:04:37,012 INFO [train.py:886] (2/4) Epoch 20, batch 50, loss[loss=0.03413, audio_tagging_loss=0.03413, over 25000.00 frames. ], tot_loss[loss=0.03747, audio_tagging_loss=0.03747, over 1118978.42 frames. ], batch size: 100, lr: 1.91e-02, grad_scale: 32.0
2023-12-20 18:04:59,859 INFO [train.py:886] (2/4) Epoch 21, batch 0, loss[loss=0.04612, audio_tagging_loss=0.04612, over 20094.00 frames. ], tot_loss[loss=0.04612, audio_tagging_loss=0.04612, over 20094.00 frames. ], batch size: 106, lr: 1.86e-02, grad_scale: 32.0
2023-12-20 18:04:59,860 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:05:20,818 INFO [train.py:917] (2/4) Epoch 21, validation: loss=0.0427, audio_tagging_loss=0.0427, over 3737520.00 frames.
2023-12-20 18:05:20,819 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:05:32,814 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=7000.0, ans=0.655
2023-12-20 18:05:48,317 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=22.49 vs. limit=10.15
2023-12-20 18:05:48,428 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.66 vs. limit=12.8
2023-12-20 18:05:50,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=7066.666666666667, ans=0.17933333333333334
2023-12-20 18:05:54,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=7133.333333333333, ans=0.307
2023-12-20 18:05:54,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=7133.333333333333, ans=0.0
2023-12-20 18:06:03,488 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.507e+01
2023-12-20 18:06:10,789 INFO [train.py:886] (2/4) Epoch 21, batch 50, loss[loss=0.03124, audio_tagging_loss=0.03124, over 25000.00 frames. ], tot_loss[loss=0.03702, audio_tagging_loss=0.03702, over 1110930.85 frames. ], batch size: 100, lr: 1.86e-02, grad_scale: 32.0
2023-12-20 18:06:29,315 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.78 vs. limit=12.96
2023-12-20 18:06:34,956 INFO [train.py:886] (2/4) Epoch 22, batch 0, loss[loss=0.03299, audio_tagging_loss=0.03299, over 25000.00 frames. ], tot_loss[loss=0.03299, audio_tagging_loss=0.03299, over 25000.00 frames. ], batch size: 100, lr: 1.82e-02, grad_scale: 32.0
2023-12-20 18:06:34,957 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:06:49,008 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.5037, 1.4958, 1.3105, 1.2949], device='cuda:2')
2023-12-20 18:06:55,944 INFO [train.py:917] (2/4) Epoch 22, validation: loss=0.04259, audio_tagging_loss=0.04259, over 3737520.00 frames.
2023-12-20 18:06:55,945 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:06:59,859 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.48 vs. limit=10.23
2023-12-20 18:07:04,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=7280.0, ans=10.23
2023-12-20 18:07:06,514 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=10.254999999999999
2023-12-20 18:07:10,812 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.833e+01 3.757e+01 4.513e+01 5.428e+01 2.125e+02, threshold=9.026e+01, percent-clipped=5.0
2023-12-20 18:07:12,287 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.49 vs. limit=10.254999999999999
2023-12-20 18:07:16,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=7413.333333333333, ans=0.15250000000000002
2023-12-20 18:07:21,835 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.56 vs. limit=10.28
2023-12-20 18:07:27,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=7480.0, ans=0.009243478260869565
2023-12-20 18:07:28,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=7480.0, ans=0.14937499999999998
2023-12-20 18:07:39,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=7546.666666666667, ans=0.035222222222222224
2023-12-20 18:07:44,531 INFO [train.py:886] (2/4) Epoch 22, batch 50, loss[loss=0.03313, audio_tagging_loss=0.03313, over 25000.00 frames. ], tot_loss[loss=0.03545, audio_tagging_loss=0.03545, over 1119124.88 frames. ], batch size: 100, lr: 1.81e-02, grad_scale: 32.0
2023-12-20 18:08:02,962 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.18 vs. limit=13.219999999999999
2023-12-20 18:08:08,666 INFO [train.py:886] (2/4) Epoch 23, batch 0, loss[loss=0.04406, audio_tagging_loss=0.04406, over 21057.00 frames. ], tot_loss[loss=0.04406, audio_tagging_loss=0.04406, over 21057.00 frames. ], batch size: 106, lr: 1.77e-02, grad_scale: 32.0
2023-12-20 18:08:08,667 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:08:30,057 INFO [train.py:917] (2/4) Epoch 23, validation: loss=0.04291, audio_tagging_loss=0.04291, over 3737520.00 frames.
2023-12-20 18:08:30,058 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:08:30,520 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.48 vs. limit=10.36
2023-12-20 18:08:31,247 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=7626.666666666667, ans=0.14250000000000002
2023-12-20 18:08:31,493 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.59 vs. limit=10.36
2023-12-20 18:08:34,307 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.42 vs. limit=10.36
2023-12-20 18:08:45,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=7693.333333333333, ans=0.13937500000000003
2023-12-20 18:08:46,739 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.46 vs. limit=10.385
2023-12-20 18:08:56,472 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.30 vs. limit=10.41
2023-12-20 18:09:17,955 INFO [train.py:886] (2/4) Epoch 23, batch 50, loss[loss=0.03305, audio_tagging_loss=0.03305, over 25000.00 frames. ], tot_loss[loss=0.03516, audio_tagging_loss=0.03516, over 1116803.91 frames. ], batch size: 100, lr: 1.77e-02, grad_scale: 32.0
2023-12-20 18:09:40,310 INFO [train.py:886] (2/4) Epoch 24, batch 0, loss[loss=0.04133, audio_tagging_loss=0.04133, over 21728.00 frames. ], tot_loss[loss=0.04133, audio_tagging_loss=0.04133, over 21728.00 frames. ], batch size: 106, lr: 1.73e-02, grad_scale: 32.0
2023-12-20 18:09:40,310 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:10:01,276 INFO [train.py:917] (2/4) Epoch 24, validation: loss=0.04248, audio_tagging_loss=0.04248, over 3737520.00 frames.
2023-12-20 18:10:01,277 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:10:06,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=7973.333333333333, ans=0.17026666666666668
2023-12-20 18:10:12,545 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.742e+01 3.651e+01 4.128e+01 4.777e+01 1.617e+02, threshold=8.255e+01, percent-clipped=1.0
2023-12-20 18:10:14,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=8040.0, ans=0.125
2023-12-20 18:10:16,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=8040.0, ans=10.515
2023-12-20 18:10:19,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=8040.0, ans=0.125
2023-12-20 18:10:38,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8173.333333333333, ans=0.21826666666666666
2023-12-20 18:10:43,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8240.0, ans=0.21760000000000002
2023-12-20 18:10:49,639 INFO [train.py:886] (2/4) Epoch 24, batch 50, loss[loss=0.03189, audio_tagging_loss=0.03189, over 25000.00 frames. ], tot_loss[loss=0.03405, audio_tagging_loss=0.03405, over 1119075.99 frames. ], batch size: 100, lr: 1.73e-02, grad_scale: 32.0
2023-12-20 18:11:13,612 INFO [train.py:886] (2/4) Epoch 25, batch 0, loss[loss=0.0338, audio_tagging_loss=0.0338, over 25000.00 frames. ], tot_loss[loss=0.0338, audio_tagging_loss=0.0338, over 25000.00 frames. ], batch size: 100, lr: 1.70e-02, grad_scale: 32.0
2023-12-20 18:11:13,613 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:11:34,705 INFO [train.py:917] (2/4) Epoch 25, validation: loss=0.04257, audio_tagging_loss=0.04257, over 3737520.00 frames.
2023-12-20 18:11:34,705 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:11:37,640 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.79 vs. limit=10.620000000000001
2023-12-20 18:11:47,148 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.00 vs. limit=10.645
2023-12-20 18:11:55,445 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.39 vs. limit=10.67
2023-12-20 18:12:00,768 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=6.559e+00
2023-12-20 18:12:04,796 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.31 vs. limit=7.4079999999999995
2023-12-20 18:12:06,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=8520.0, ans=0.6018
2023-12-20 18:12:10,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8520.0, ans=0.2148
2023-12-20 18:12:10,751 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.24 vs. limit=10.695
2023-12-20 18:12:11,706 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.88 vs. limit=13.89
2023-12-20 18:12:22,266 INFO [train.py:886] (2/4) Epoch 25, batch 50, loss[loss=0.03228, audio_tagging_loss=0.03228, over 25000.00 frames. ], tot_loss[loss=0.03326, audio_tagging_loss=0.03326, over 1123209.09 frames. ], batch size: 100, lr: 1.70e-02, grad_scale: 32.0
2023-12-20 18:12:45,044 INFO [train.py:886] (2/4) Epoch 26, batch 0, loss[loss=0.04138, audio_tagging_loss=0.04138, over 20177.00 frames. ], tot_loss[loss=0.04138, audio_tagging_loss=0.04138, over 20177.00 frames. ], batch size: 106, lr: 1.66e-02, grad_scale: 32.0
2023-12-20 18:12:45,044 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:13:05,884 INFO [train.py:917] (2/4) Epoch 26, validation: loss=0.04241, audio_tagging_loss=0.04241, over 3737520.00 frames.
2023-12-20 18:13:05,885 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:13:11,871 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=22.46 vs. limit=10.75
2023-12-20 18:13:12,404 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.047e+01 3.673e+01 4.044e+01 4.675e+01 8.607e+01, threshold=8.088e+01, percent-clipped=1.0
2023-12-20 18:13:12,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=8666.666666666666, ans=0.5966666666666667
2023-12-20 18:13:15,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8733.333333333334, ans=0.21266666666666667
2023-12-20 18:13:16,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8733.333333333334, ans=0.21266666666666667
2023-12-20 18:13:16,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=8733.333333333334, ans=10.775
2023-12-20 18:13:20,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=8733.333333333334, ans=0.5943333333333334
2023-12-20 18:13:23,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8800.0, ans=0.212
2023-12-20 18:13:31,751 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.13 vs. limit=10.8
2023-12-20 18:13:41,604 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=14.15
2023-12-20 18:13:44,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=8933.333333333334, ans=0.125
2023-12-20 18:13:47,740 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=27.82 vs. limit=10.85
2023-12-20 18:13:52,997 INFO [train.py:886] (2/4) Epoch 26, batch 50, loss[loss=0.02956, audio_tagging_loss=0.02956, over 25000.00 frames. ], tot_loss[loss=0.03224, audio_tagging_loss=0.03224, over 1119526.98 frames. ], batch size: 100, lr: 1.66e-02, grad_scale: 32.0
2023-12-20 18:14:18,299 INFO [train.py:886] (2/4) Epoch 27, batch 0, loss[loss=0.0303, audio_tagging_loss=0.0303, over 25000.00 frames. ], tot_loss[loss=0.0303, audio_tagging_loss=0.0303, over 25000.00 frames. ], batch size: 100, lr: 1.63e-02, grad_scale: 32.0
2023-12-20 18:14:18,300 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:14:31,139 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.3577, 1.7678, 2.1146, 2.2565], device='cuda:2')
2023-12-20 18:14:37,636 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.5915, 3.0487, 2.5976, 2.5353], device='cuda:2')
2023-12-20 18:14:39,325 INFO [train.py:917] (2/4) Epoch 27, validation: loss=0.04294, audio_tagging_loss=0.04294, over 3737520.00 frames.
2023-12-20 18:14:39,326 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:14:44,243 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=27.24 vs. limit=10.879999999999999
2023-12-20 18:14:46,294 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.02 vs. limit=10.879999999999999
2023-12-20 18:14:49,979 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.70 vs. limit=14.309999999999999
2023-12-20 18:14:51,858 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=31.65 vs. limit=10.905
2023-12-20 18:15:09,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=9213.333333333334, ans=0.5775333333333333
2023-12-20 18:15:26,761 INFO [train.py:886] (2/4) Epoch 27, batch 50, loss[loss=0.02842, audio_tagging_loss=0.02842, over 25000.00 frames. ], tot_loss[loss=0.03164, audio_tagging_loss=0.03164, over 1117606.62 frames. ], batch size: 100, lr: 1.63e-02, grad_scale: 32.0
2023-12-20 18:15:48,260 INFO [train.py:886] (2/4) Epoch 28, batch 0, loss[loss=0.0301, audio_tagging_loss=0.0301, over 25000.00 frames. ], tot_loss[loss=0.0301, audio_tagging_loss=0.0301, over 25000.00 frames. ], batch size: 100, lr: 1.60e-02, grad_scale: 32.0
2023-12-20 18:15:48,261 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:16:09,705 INFO [train.py:917] (2/4) Epoch 28, validation: loss=0.04282, audio_tagging_loss=0.04282, over 3737520.00 frames.
2023-12-20 18:16:09,705 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:16:12,511 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.131e+01 3.970e+01 4.630e+01 5.343e+01 9.281e+01, threshold=9.260e+01, percent-clipped=1.0
2023-12-20 18:16:24,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=9426.666666666666, ans=0.20573333333333332
2023-12-20 18:16:32,050 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=9493.333333333334, ans=0.027111111111111114
2023-12-20 18:16:32,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=9493.333333333334, ans=0.125
2023-12-20 18:16:38,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=9560.0, ans=0.008791304347826087
2023-12-20 18:16:54,790 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.45 vs. limit=7.406666666666666
2023-12-20 18:16:56,950 INFO [train.py:886] (2/4) Epoch 28, batch 50, loss[loss=0.02563, audio_tagging_loss=0.02563, over 25000.00 frames. ], tot_loss[loss=0.03101, audio_tagging_loss=0.03101, over 1120311.90 frames. ], batch size: 100, lr: 1.60e-02, grad_scale: 32.0
2023-12-20 18:17:19,793 INFO [train.py:886] (2/4) Epoch 29, batch 0, loss[loss=0.03977, audio_tagging_loss=0.03977, over 20634.00 frames. ], tot_loss[loss=0.03977, audio_tagging_loss=0.03977, over 20634.00 frames. ], batch size: 106, lr: 1.57e-02, grad_scale: 32.0
2023-12-20 18:17:19,793 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:17:30,680 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.5305, 2.3478, 2.4033, 2.3839], device='cuda:2')
2023-12-20 18:17:31,974 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.8318, 1.6548, 1.3662, 1.6184, 1.7442, 1.6820, 1.5322, 1.6898],
device='cuda:2')
2023-12-20 18:17:40,753 INFO [train.py:917] (2/4) Epoch 29, validation: loss=0.04276, audio_tagging_loss=0.04276, over 3737520.00 frames.
2023-12-20 18:17:40,754 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:17:42,826 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.55 vs. limit=11.14
2023-12-20 18:18:06,856 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.34 vs. limit=11.19
2023-12-20 18:18:07,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=9840.0, ans=0.5556000000000001
2023-12-20 18:18:11,856 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.93 vs. limit=14.93
2023-12-20 18:18:29,134 INFO [train.py:886] (2/4) Epoch 29, batch 50, loss[loss=0.02734, audio_tagging_loss=0.02734, over 25000.00 frames. ], tot_loss[loss=0.02999, audio_tagging_loss=0.02999, over 1118289.13 frames. ], batch size: 100, lr: 1.57e-02, grad_scale: 32.0
2023-12-20 18:18:29,998 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.044e+01 4.177e+01 4.600e+01 5.564e+01 7.757e+01, threshold=9.200e+01, percent-clipped=0.0
2023-12-20 18:18:46,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=10053.333333333334, ans=0.09899494936611666
2023-12-20 18:18:51,720 INFO [train.py:886] (2/4) Epoch 30, batch 0, loss[loss=0.03279, audio_tagging_loss=0.03279, over 24101.00 frames. ], tot_loss[loss=0.03279, audio_tagging_loss=0.03279, over 24101.00 frames. ], batch size: 100, lr: 1.54e-02, grad_scale: 32.0
2023-12-20 18:18:51,721 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:18:59,713 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.8432, 2.2270, 2.3182, 2.7913], device='cuda:2')
2023-12-20 18:19:01,818 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.7919, 2.4071, 2.4041, 2.8809], device='cuda:2')
2023-12-20 18:19:12,593 INFO [train.py:917] (2/4) Epoch 30, validation: loss=0.04346, audio_tagging_loss=0.04346, over 3737520.00 frames.
2023-12-20 18:19:12,593 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:19:28,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=10120.0, ans=0.125
2023-12-20 18:19:29,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=10120.0, ans=0.125
2023-12-20 18:19:31,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=10186.666666666666, ans=0.125
2023-12-20 18:19:40,991 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.10 vs. limit=11.345
2023-12-20 18:19:54,857 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.89 vs. limit=10.16
2023-12-20 18:19:57,519 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.23 vs. limit=15.24
2023-12-20 18:19:59,939 INFO [train.py:886] (2/4) Epoch 30, batch 50, loss[loss=0.02758, audio_tagging_loss=0.02758, over 25000.00 frames. ], tot_loss[loss=0.02922, audio_tagging_loss=0.02922, over 1121142.49 frames. ], batch size: 100, lr: 1.54e-02, grad_scale: 32.0
2023-12-20 18:20:22,405 INFO [train.py:886] (2/4) Epoch 31, batch 0, loss[loss=0.02425, audio_tagging_loss=0.02425, over 25000.00 frames. ], tot_loss[loss=0.02425, audio_tagging_loss=0.02425, over 25000.00 frames. ], batch size: 100, lr: 1.52e-02, grad_scale: 32.0
2023-12-20 18:20:22,406 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:20:32,842 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.1955, 1.7944, 1.6471, 1.8199, 1.8954, 1.8125, 1.6294, 1.8020],
device='cuda:2')
2023-12-20 18:20:33,457 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.4324, 2.3833, 2.5484, 2.2773], device='cuda:2')
2023-12-20 18:20:43,502 INFO [train.py:917] (2/4) Epoch 31, validation: loss=0.04363, audio_tagging_loss=0.04363, over 3737520.00 frames.
2023-12-20 18:20:43,503 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:20:50,501 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=10400.0, ans=0.125
2023-12-20 18:21:01,029 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.58 vs. limit=11.425
2023-12-20 18:21:14,147 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=10600.0, ans=0.194
2023-12-20 18:21:17,286 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=11.475
2023-12-20 18:21:19,805 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=4.177e-01
2023-12-20 18:21:20,785 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=10600.0, ans=0.125
2023-12-20 18:21:29,122 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.385e+01 4.278e+01 4.904e+01 5.799e+01 1.168e+02, threshold=9.808e+01, percent-clipped=2.0
2023-12-20 18:21:30,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=10666.666666666666, ans=10.0
2023-12-20 18:21:31,839 INFO [train.py:886] (2/4) Epoch 31, batch 50, loss[loss=0.02422, audio_tagging_loss=0.02422, over 25000.00 frames. ], tot_loss[loss=0.02842, audio_tagging_loss=0.02842, over 1117091.33 frames. ], batch size: 100, lr: 1.51e-02, grad_scale: 32.0
2023-12-20 18:21:32,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=10733.333333333334, ans=0.5243333333333333
2023-12-20 18:21:54,500 INFO [train.py:886] (2/4) Epoch 32, batch 0, loss[loss=0.03717, audio_tagging_loss=0.03717, over 21316.00 frames. ], tot_loss[loss=0.03717, audio_tagging_loss=0.03717, over 21316.00 frames. ], batch size: 106, lr: 1.49e-02, grad_scale: 32.0
2023-12-20 18:21:54,500 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:22:10,447 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.5819, 2.4984, 2.6485, 2.4092], device='cuda:2')
2023-12-20 18:22:15,977 INFO [train.py:917] (2/4) Epoch 32, validation: loss=0.04494, audio_tagging_loss=0.04494, over 3737520.00 frames.
2023-12-20 18:22:15,977 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:22:16,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=10746.666666666666, ans=0.125
2023-12-20 18:22:20,156 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.20 vs. limit=15.56
2023-12-20 18:22:20,817 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=10746.666666666666, ans=0.021888888888888892
2023-12-20 18:22:24,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=10813.333333333334, ans=0.125
2023-12-20 18:22:25,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=10813.333333333334, ans=0.125
2023-12-20 18:22:37,831 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.69 vs. limit=11.58
2023-12-20 18:22:38,400 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=10880.0, ans=0.0
2023-12-20 18:22:40,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=10880.0, ans=0.19119999999999998
2023-12-20 18:22:46,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=10946.666666666666, ans=0.125
2023-12-20 18:22:56,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=11013.333333333334, ans=0.125
2023-12-20 18:22:58,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=11013.333333333334, ans=0.020777777777777773
2023-12-20 18:23:02,816 INFO [train.py:886] (2/4) Epoch 32, batch 50, loss[loss=0.02819, audio_tagging_loss=0.02819, over 25000.00 frames. ], tot_loss[loss=0.02718, audio_tagging_loss=0.02718, over 1121418.55 frames. ], batch size: 100, lr: 1.49e-02, grad_scale: 32.0
2023-12-20 18:23:23,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=11093.333333333334, ans=0.18906666666666666
2023-12-20 18:23:25,183 INFO [train.py:886] (2/4) Epoch 33, batch 0, loss[loss=0.03206, audio_tagging_loss=0.03206, over 21614.00 frames. ], tot_loss[loss=0.03206, audio_tagging_loss=0.03206, over 21614.00 frames. ], batch size: 106, lr: 1.47e-02, grad_scale: 32.0
2023-12-20 18:23:25,184 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:23:46,120 INFO [train.py:917] (2/4) Epoch 33, validation: loss=0.0459, audio_tagging_loss=0.0459, over 3737520.00 frames.
2023-12-20 18:23:46,121 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:23:49,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=11093.333333333334, ans=0.125
2023-12-20 18:23:59,793 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.62 vs. limit=11.684999999999999
2023-12-20 18:24:20,377 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=11293.333333333334, ans=0.125
2023-12-20 18:24:25,132 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=11360.0, ans=0.18639999999999998
2023-12-20 18:24:25,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=11360.0, ans=0.019333333333333338
2023-12-20 18:24:26,653 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.339e+01 4.449e+01 5.027e+01 5.967e+01 1.050e+02, threshold=1.005e+02, percent-clipped=1.0
2023-12-20 18:24:32,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=11426.666666666666, ans=0.125
2023-12-20 18:24:33,022 INFO [train.py:886] (2/4) Epoch 33, batch 50, loss[loss=0.02471, audio_tagging_loss=0.02471, over 25000.00 frames. ], tot_loss[loss=0.02623, audio_tagging_loss=0.02623, over 1116458.53 frames. ], batch size: 100, lr: 1.47e-02, grad_scale: 32.0
2023-12-20 18:24:54,841 INFO [train.py:886] (2/4) Epoch 34, batch 0, loss[loss=0.02526, audio_tagging_loss=0.02526, over 25000.00 frames. ], tot_loss[loss=0.02526, audio_tagging_loss=0.02526, over 25000.00 frames. ], batch size: 100, lr: 1.44e-02, grad_scale: 32.0
2023-12-20 18:24:54,841 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:25:16,061 INFO [train.py:917] (2/4) Epoch 34, validation: loss=0.0463, audio_tagging_loss=0.0463, over 3737520.00 frames.
2023-12-20 18:25:16,062 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:25:29,233 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=11506.666666666666, ans=0.018722222222222223
2023-12-20 18:25:31,978 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=11506.666666666666, ans=0.18493333333333334
2023-12-20 18:25:51,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=11640.0, ans=0.00833913043478261
2023-12-20 18:25:56,859 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.25 vs. limit=11.89
2023-12-20 18:26:02,681 INFO [train.py:886] (2/4) Epoch 34, batch 50, loss[loss=0.02265, audio_tagging_loss=0.02265, over 25000.00 frames. ], tot_loss[loss=0.02531, audio_tagging_loss=0.02531, over 1120059.27 frames. ], batch size: 100, lr: 1.44e-02, grad_scale: 32.0
2023-12-20 18:26:24,394 INFO [train.py:886] (2/4) Epoch 35, batch 0, loss[loss=0.02218, audio_tagging_loss=0.02218, over 25000.00 frames. ], tot_loss[loss=0.02218, audio_tagging_loss=0.02218, over 25000.00 frames. ], batch size: 100, lr: 1.42e-02, grad_scale: 32.0
2023-12-20 18:26:24,395 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:26:43,536 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.0184, 2.5182, 2.4109, 2.8270], device='cuda:2')
2023-12-20 18:26:45,180 INFO [train.py:917] (2/4) Epoch 35, validation: loss=0.04736, audio_tagging_loss=0.04736, over 3737520.00 frames.
2023-12-20 18:26:45,181 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:27:13,873 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.961e-01
2023-12-20 18:27:20,383 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.55 vs. limit=11.995000000000001
2023-12-20 18:27:23,466 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.764e+01 4.533e+01 5.198e+01 5.955e+01 1.043e+02, threshold=1.040e+02, percent-clipped=1.0
2023-12-20 18:27:23,861 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.03 vs. limit=12.02
2023-12-20 18:27:32,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=12120.0, ans=0.125
2023-12-20 18:27:33,754 INFO [train.py:886] (2/4) Epoch 35, batch 50, loss[loss=0.02357, audio_tagging_loss=0.02357, over 25000.00 frames. ], tot_loss[loss=0.02389, audio_tagging_loss=0.02389, over 1124135.09 frames. ], batch size: 100, lr: 1.42e-02, grad_scale: 32.0
2023-12-20 18:27:55,038 INFO [train.py:886] (2/4) Epoch 36, batch 0, loss[loss=0.03018, audio_tagging_loss=0.03018, over 20513.00 frames. ], tot_loss[loss=0.03018, audio_tagging_loss=0.03018, over 20513.00 frames. ], batch size: 106, lr: 1.40e-02, grad_scale: 32.0
2023-12-20 18:27:55,038 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:28:11,987 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.0909, 4.4174, 4.4397, 4.3308], device='cuda:2')
2023-12-20 18:28:16,073 INFO [train.py:917] (2/4) Epoch 36, validation: loss=0.04841, audio_tagging_loss=0.04841, over 3737520.00 frames.
2023-12-20 18:28:16,073 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:28:16,521 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.58 vs. limit=8.033333333333333
2023-12-20 18:28:19,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=12133.333333333334, ans=0.008231884057971015
2023-12-20 18:28:41,777 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.00 vs. limit=8.066666666666666
2023-12-20 18:29:03,192 INFO [train.py:886] (2/4) Epoch 36, batch 50, loss[loss=0.02135, audio_tagging_loss=0.02135, over 25000.00 frames. ], tot_loss[loss=0.02365, audio_tagging_loss=0.02365, over 1119716.98 frames. ], batch size: 100, lr: 1.40e-02, grad_scale: 32.0
2023-12-20 18:29:24,449 INFO [train.py:886] (2/4) Epoch 37, batch 0, loss[loss=0.02786, audio_tagging_loss=0.02786, over 20498.00 frames. ], tot_loss[loss=0.02786, audio_tagging_loss=0.02786, over 20498.00 frames. ], batch size: 106, lr: 1.38e-02, grad_scale: 32.0
2023-12-20 18:29:24,450 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:29:34,311 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.0687, 4.7948, 4.6310, 4.4593], device='cuda:2')
2023-12-20 18:29:45,679 INFO [train.py:917] (2/4) Epoch 37, validation: loss=0.04928, audio_tagging_loss=0.04928, over 3737520.00 frames.
2023-12-20 18:29:45,680 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:29:46,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=12480.0, ans=0.4632
2023-12-20 18:29:49,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=12480.0, ans=0.125
2023-12-20 18:29:58,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=12546.666666666666, ans=0.125
2023-12-20 18:30:00,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=12546.666666666666, ans=0.008142028985507246
2023-12-20 18:30:02,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=12546.666666666666, ans=0.17453333333333335
2023-12-20 18:30:12,054 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=22.46 vs. limit=12.23
2023-12-20 18:30:19,005 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.554e+01 4.732e+01 5.545e+01 6.466e+01 1.044e+02, threshold=1.109e+02, percent-clipped=1.0
2023-12-20 18:30:26,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=12746.666666666666, ans=0.125
2023-12-20 18:30:32,823 INFO [train.py:886] (2/4) Epoch 37, batch 50, loss[loss=0.0178, audio_tagging_loss=0.0178, over 25000.00 frames. ], tot_loss[loss=0.02257, audio_tagging_loss=0.02257, over 1116382.77 frames. ], batch size: 100, lr: 1.38e-02, grad_scale: 32.0
2023-12-20 18:30:55,794 INFO [train.py:886] (2/4) Epoch 38, batch 0, loss[loss=0.02753, audio_tagging_loss=0.02753, over 21206.00 frames. ], tot_loss[loss=0.02753, audio_tagging_loss=0.02753, over 21206.00 frames. ], batch size: 106, lr: 1.36e-02, grad_scale: 32.0
2023-12-20 18:30:55,794 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:31:16,994 INFO [train.py:917] (2/4) Epoch 38, validation: loss=0.04916, audio_tagging_loss=0.04916, over 3737520.00 frames.
2023-12-20 18:31:16,994 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:31:20,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=12826.666666666666, ans=0.125
2023-12-20 18:31:21,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=12826.666666666666, ans=0.125
2023-12-20 18:31:25,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=12826.666666666666, ans=0.125
2023-12-20 18:31:35,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=12960.0, ans=0.125
2023-12-20 18:31:39,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=12960.0, ans=0.09899494936611666
2023-12-20 18:31:41,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=12960.0, ans=0.04949747468305833
2023-12-20 18:32:04,832 INFO [train.py:886] (2/4) Epoch 38, batch 50, loss[loss=0.01835, audio_tagging_loss=0.01835, over 25000.00 frames. ], tot_loss[loss=0.02232, audio_tagging_loss=0.02232, over 1109728.35 frames. ], batch size: 100, lr: 1.36e-02, grad_scale: 32.0
2023-12-20 18:32:26,418 INFO [train.py:886] (2/4) Epoch 39, batch 0, loss[loss=0.02121, audio_tagging_loss=0.02121, over 24125.00 frames. ], tot_loss[loss=0.02121, audio_tagging_loss=0.02121, over 24125.00 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 32.0
2023-12-20 18:32:26,419 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:32:46,472 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.6173, 1.9408, 1.7376, 2.2254, 2.0347, 2.0442, 1.8605, 2.0572],
device='cuda:2')
2023-12-20 18:32:47,549 INFO [train.py:917] (2/4) Epoch 39, validation: loss=0.05058, audio_tagging_loss=0.05058, over 3737520.00 frames.
2023-12-20 18:32:47,550 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:33:00,005 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=13240.0, ans=8.31
2023-12-20 18:33:01,723 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.85 vs. limit=9.296
2023-12-20 18:33:17,347 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.971e+01 5.139e+01 5.911e+01 6.986e+01 1.449e+02, threshold=1.182e+02, percent-clipped=3.0
2023-12-20 18:33:17,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=13373.333333333334, ans=0.125
2023-12-20 18:33:20,932 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=13373.333333333334, ans=0.010944444444444444
2023-12-20 18:33:22,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=13373.333333333334, ans=0.125
2023-12-20 18:33:29,650 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.23 vs. limit=12.54
2023-12-20 18:33:35,368 INFO [train.py:886] (2/4) Epoch 39, batch 50, loss[loss=0.0191, audio_tagging_loss=0.0191, over 25000.00 frames. ], tot_loss[loss=0.02097, audio_tagging_loss=0.02097, over 1117713.56 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 32.0
2023-12-20 18:33:57,923 INFO [train.py:886] (2/4) Epoch 40, batch 0, loss[loss=0.02404, audio_tagging_loss=0.02404, over 24092.00 frames. ], tot_loss[loss=0.02404, audio_tagging_loss=0.02404, over 24092.00 frames. ], batch size: 100, lr: 1.32e-02, grad_scale: 32.0
2023-12-20 18:33:57,924 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:34:19,043 INFO [train.py:917] (2/4) Epoch 40, validation: loss=0.05208, audio_tagging_loss=0.05208, over 3737520.00 frames.
2023-12-20 18:34:19,043 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:34:21,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=13520.0, ans=0.1648
2023-12-20 18:34:23,890 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.35 vs. limit=12.57
2023-12-20 18:34:41,804 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.59 vs. limit=12.620000000000001
2023-12-20 18:34:58,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=13786.666666666666, ans=0.4174666666666667
2023-12-20 18:35:06,527 INFO [train.py:886] (2/4) Epoch 40, batch 50, loss[loss=0.01745, audio_tagging_loss=0.01745, over 25000.00 frames. ], tot_loss[loss=0.01985, audio_tagging_loss=0.01985, over 1124183.14 frames. ], batch size: 100, lr: 1.32e-02, grad_scale: 32.0
2023-12-20 18:35:29,528 INFO [train.py:886] (2/4) Epoch 41, batch 0, loss[loss=0.02016, audio_tagging_loss=0.02016, over 25000.00 frames. ], tot_loss[loss=0.02016, audio_tagging_loss=0.02016, over 25000.00 frames. ], batch size: 100, lr: 1.30e-02, grad_scale: 32.0
2023-12-20 18:35:29,528 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:35:48,061 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.3372, 3.0105, 3.3948, 3.0879], device='cuda:2')
2023-12-20 18:35:50,405 INFO [train.py:917] (2/4) Epoch 41, validation: loss=0.05259, audio_tagging_loss=0.05259, over 3737520.00 frames.
2023-12-20 18:35:50,406 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:36:02,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=13933.333333333334, ans=0.008611111111111104
2023-12-20 18:36:04,368 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=12.725
2023-12-20 18:36:16,659 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.775e+01 5.160e+01 5.694e+01 6.780e+01 1.124e+02, threshold=1.139e+02, percent-clipped=0.0
2023-12-20 18:36:23,512 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.41 vs. limit=12.775
2023-12-20 18:36:31,764 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=14133.333333333334, ans=0.15866666666666665
2023-12-20 18:36:33,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=14133.333333333334, ans=0.125
2023-12-20 18:36:37,896 INFO [train.py:886] (2/4) Epoch 41, batch 50, loss[loss=0.01783, audio_tagging_loss=0.01783, over 25000.00 frames. ], tot_loss[loss=0.01954, audio_tagging_loss=0.01954, over 1118676.14 frames. ], batch size: 100, lr: 1.30e-02, grad_scale: 32.0
2023-12-20 18:37:00,638 INFO [train.py:886] (2/4) Epoch 42, batch 0, loss[loss=0.02661, audio_tagging_loss=0.02661, over 19785.00 frames. ], tot_loss[loss=0.02661, audio_tagging_loss=0.02661, over 19785.00 frames. ], batch size: 106, lr: 1.29e-02, grad_scale: 32.0
2023-12-20 18:37:00,639 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:37:21,716 INFO [train.py:917] (2/4) Epoch 42, validation: loss=0.0541, audio_tagging_loss=0.0541, over 3737520.00 frames.
2023-12-20 18:37:21,717 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:37:27,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=14213.333333333334, ans=0.007779710144927536
2023-12-20 18:37:34,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=14280.0, ans=0.125
2023-12-20 18:37:37,751 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.72 vs. limit=12.855
2023-12-20 18:38:07,484 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.59 vs. limit=5.172000000000001
2023-12-20 18:38:09,808 INFO [train.py:886] (2/4) Epoch 42, batch 50, loss[loss=0.0145, audio_tagging_loss=0.0145, over 25000.00 frames. ], tot_loss[loss=0.01847, audio_tagging_loss=0.01847, over 1113558.36 frames. ], batch size: 100, lr: 1.29e-02, grad_scale: 32.0
2023-12-20 18:38:32,308 INFO [train.py:886] (2/4) Epoch 43, batch 0, loss[loss=0.02318, audio_tagging_loss=0.02318, over 21325.00 frames. ], tot_loss[loss=0.02318, audio_tagging_loss=0.02318, over 21325.00 frames. ], batch size: 106, lr: 1.27e-02, grad_scale: 32.0
2023-12-20 18:38:32,309 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:38:40,419 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.4399, 2.6202, 2.8722, 2.6384], device='cuda:2')
2023-12-20 18:38:53,027 INFO [train.py:917] (2/4) Epoch 43, validation: loss=0.05602, audio_tagging_loss=0.05602, over 3737520.00 frames.
2023-12-20 18:38:53,027 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:39:03,929 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=14626.666666666666, ans=0.38806666666666667
2023-12-20 18:39:16,028 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 4.316e+01 5.471e+01 6.063e+01 6.688e+01 1.130e+02, threshold=1.213e+02, percent-clipped=0.0
2023-12-20 18:39:19,925 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=14693.333333333334, ans=0.125
2023-12-20 18:39:29,540 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=14760.0, ans=0.007660869565217391
2023-12-20 18:39:31,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=14826.666666666666, ans=0.125
2023-12-20 18:39:38,292 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.23 vs. limit=13.059999999999999
2023-12-20 18:39:41,479 INFO [train.py:886] (2/4) Epoch 43, batch 50, loss[loss=0.01481, audio_tagging_loss=0.01481, over 25000.00 frames. ], tot_loss[loss=0.01771, audio_tagging_loss=0.01771, over 1120921.25 frames. ], batch size: 100, lr: 1.27e-02, grad_scale: 32.0
2023-12-20 18:40:04,353 INFO [train.py:886] (2/4) Epoch 44, batch 0, loss[loss=0.01558, audio_tagging_loss=0.01558, over 25000.00 frames. ], tot_loss[loss=0.01558, audio_tagging_loss=0.01558, over 25000.00 frames. ], batch size: 100, lr: 1.25e-02, grad_scale: 32.0
2023-12-20 18:40:04,354 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:40:25,321 INFO [train.py:917] (2/4) Epoch 44, validation: loss=0.05682, audio_tagging_loss=0.05682, over 3737520.00 frames.
2023-12-20 18:40:25,322 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:40:40,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=14973.333333333334, ans=0.15026666666666666
2023-12-20 18:40:52,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=15040.0, ans=0.125
2023-12-20 18:40:53,888 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.42 vs. limit=5.266
2023-12-20 18:40:57,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=15106.666666666666, ans=0.125
2023-12-20 18:41:12,867 INFO [train.py:886] (2/4) Epoch 44, batch 50, loss[loss=0.0152, audio_tagging_loss=0.0152, over 25000.00 frames. ], tot_loss[loss=0.01714, audio_tagging_loss=0.01714, over 1121136.66 frames. ], batch size: 100, lr: 1.25e-02, grad_scale: 32.0
2023-12-20 18:41:35,895 INFO [train.py:886] (2/4) Epoch 45, batch 0, loss[loss=0.01509, audio_tagging_loss=0.01509, over 25000.00 frames. ], tot_loss[loss=0.01509, audio_tagging_loss=0.01509, over 25000.00 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 32.0
2023-12-20 18:41:35,896 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:41:56,899 INFO [train.py:917] (2/4) Epoch 45, validation: loss=0.05811, audio_tagging_loss=0.05811, over 3737520.00 frames.
2023-12-20 18:41:56,900 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:42:15,037 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.27 vs. limit=5.298
2023-12-20 18:42:15,214 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.876e+01 5.082e+01 5.625e+01 6.615e+01 1.122e+02, threshold=1.125e+02, percent-clipped=0.0
2023-12-20 18:42:25,650 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.90 vs. limit=13.295
2023-12-20 18:42:26,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=15453.333333333334, ans=0.125
2023-12-20 18:42:30,048 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=15453.333333333334, ans=0.14546666666666666
2023-12-20 18:42:35,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=15520.0, ans=0.007495652173913044
2023-12-20 18:42:38,420 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.23 vs. limit=13.32
2023-12-20 18:42:44,425 INFO [train.py:886] (2/4) Epoch 45, batch 50, loss[loss=0.0147, audio_tagging_loss=0.0147, over 25000.00 frames. ], tot_loss[loss=0.01575, audio_tagging_loss=0.01575, over 1124023.84 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 64.0
2023-12-20 18:43:02,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=15600.0, ans=0.0
2023-12-20 18:43:02,321 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=13.35
2023-12-20 18:43:06,816 INFO [train.py:886] (2/4) Epoch 46, batch 0, loss[loss=0.01788, audio_tagging_loss=0.01788, over 24133.00 frames. ], tot_loss[loss=0.01788, audio_tagging_loss=0.01788, over 24133.00 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 64.0
2023-12-20 18:43:06,816 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:43:18,177 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.5517, 2.9490, 2.9990, 2.8908], device='cuda:2')
2023-12-20 18:43:27,876 INFO [train.py:917] (2/4) Epoch 46, validation: loss=0.05956, audio_tagging_loss=0.05956, over 3737520.00 frames.
2023-12-20 18:43:27,876 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:43:37,053 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=15666.666666666666, ans=10.0
2023-12-20 18:43:38,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=15666.666666666666, ans=0.125
2023-12-20 18:43:40,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=15666.666666666666, ans=0.3516666666666667
2023-12-20 18:43:49,001 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=15733.333333333334, ans=0.0
2023-12-20 18:43:53,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=15733.333333333334, ans=0.007449275362318841
2023-12-20 18:43:58,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=15800.0, ans=0.0
2023-12-20 18:44:15,173 INFO [train.py:886] (2/4) Epoch 46, batch 50, loss[loss=0.01339, audio_tagging_loss=0.01339, over 25000.00 frames. ], tot_loss[loss=0.01494, audio_tagging_loss=0.01494, over 1123908.16 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 64.0
2023-12-20 18:44:38,147 INFO [train.py:886] (2/4) Epoch 47, batch 0, loss[loss=0.01721, audio_tagging_loss=0.01721, over 20990.00 frames. ], tot_loss[loss=0.01721, audio_tagging_loss=0.01721, over 20990.00 frames. ], batch size: 106, lr: 1.21e-02, grad_scale: 64.0
2023-12-20 18:44:38,148 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:44:48,564 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.4234, 3.0464, 3.5152, 3.1036], device='cuda:2')
2023-12-20 18:44:59,320 INFO [train.py:917] (2/4) Epoch 47, validation: loss=0.06125, audio_tagging_loss=0.06125, over 3737520.00 frames.
2023-12-20 18:44:59,320 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:45:08,592 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=16013.333333333334, ans=0.33953333333333335
2023-12-20 18:45:12,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=16013.333333333334, ans=0.035
2023-12-20 18:45:14,000 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 4.428e+01 5.199e+01 5.973e+01 6.776e+01 1.435e+02, threshold=1.195e+02, percent-clipped=1.0
2023-12-20 18:45:26,524 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.00 vs. limit=13.530000000000001
2023-12-20 18:45:46,336 INFO [train.py:886] (2/4) Epoch 47, batch 50, loss[loss=0.01396, audio_tagging_loss=0.01396, over 25000.00 frames. ], tot_loss[loss=0.01445, audio_tagging_loss=0.01445, over 1117903.36 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0
2023-12-20 18:46:04,313 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=16293.333333333334, ans=0.125
2023-12-20 18:46:08,724 INFO [train.py:886] (2/4) Epoch 48, batch 0, loss[loss=0.01321, audio_tagging_loss=0.01321, over 24086.00 frames. ], tot_loss[loss=0.01321, audio_tagging_loss=0.01321, over 24086.00 frames. ], batch size: 100, lr: 1.20e-02, grad_scale: 64.0
2023-12-20 18:46:08,725 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:46:29,402 INFO [train.py:917] (2/4) Epoch 48, validation: loss=0.06238, audio_tagging_loss=0.06238, over 3737520.00 frames.
2023-12-20 18:46:29,403 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:46:39,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=16360.0, ans=0.00731304347826087
2023-12-20 18:46:46,676 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.87 vs. limit=13.635
2023-12-20 18:46:47,680 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.49 vs. limit=13.635
2023-12-20 18:46:50,473 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=6.24 vs. limit=10.570666666666668
2023-12-20 18:46:53,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=16426.666666666668, ans=0.125
2023-12-20 18:46:55,791 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=16426.666666666668, ans=0.125
2023-12-20 18:47:10,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=16560.0, ans=0.0
2023-12-20 18:47:14,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=16560.0, ans=0.13440000000000002
2023-12-20 18:47:16,770 INFO [train.py:886] (2/4) Epoch 48, batch 50, loss[loss=0.01058, audio_tagging_loss=0.01058, over 25000.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 1127183.65 frames. ], batch size: 100, lr: 1.19e-02, grad_scale: 64.0
2023-12-20 18:47:37,825 INFO [train.py:886] (2/4) Epoch 49, batch 0, loss[loss=0.01802, audio_tagging_loss=0.01802, over 20499.00 frames. ], tot_loss[loss=0.01802, audio_tagging_loss=0.01802, over 20499.00 frames. ], batch size: 106, lr: 1.18e-02, grad_scale: 64.0
2023-12-20 18:47:37,825 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:47:58,811 INFO [train.py:917] (2/4) Epoch 49, validation: loss=0.06394, audio_tagging_loss=0.06394, over 3737520.00 frames.
2023-12-20 18:47:58,811 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:48:09,451 WARNING [optim.py:484] (2/4) Clipping_scale=2.0, grad-norm quartiles 4.348e+01 5.324e+01 6.019e+01 6.956e+01 1.317e+02, threshold=1.204e+02, percent-clipped=1.0
2023-12-20 18:48:09,593 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=16706.666666666668, ans=0.125
2023-12-20 18:48:09,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=16706.666666666668, ans=0.125
2023-12-20 18:48:24,651 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.46 vs. limit=13.79
2023-12-20 18:48:28,615 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=16840.0, ans=0.125
2023-12-20 18:48:41,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=16906.666666666668, ans=0.0
2023-12-20 18:48:45,782 INFO [train.py:886] (2/4) Epoch 49, batch 50, loss[loss=0.01319, audio_tagging_loss=0.01319, over 25000.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 1116344.42 frames. ], batch size: 100, lr: 1.18e-02, grad_scale: 64.0
2023-12-20 18:49:07,494 INFO [train.py:886] (2/4) Epoch 50, batch 0, loss[loss=0.01496, audio_tagging_loss=0.01496, over 24131.00 frames. ], tot_loss[loss=0.01496, audio_tagging_loss=0.01496, over 24131.00 frames. ], batch size: 100, lr: 1.17e-02, grad_scale: 64.0
2023-12-20 18:49:07,495 INFO [train.py:909] (2/4) Computing validation loss
2023-12-20 18:49:17,438 INFO [zipformer.py:1858] (2/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.1588, 2.8040, 2.7916, 2.7895], device='cuda:2')
2023-12-20 18:49:28,222 INFO [train.py:917] (2/4) Epoch 50, validation: loss=0.06678, audio_tagging_loss=0.06678, over 3737520.00 frames.
2023-12-20 18:49:28,223 INFO [train.py:918] (2/4) Maximum memory allocated so far is 14828MB
2023-12-20 18:49:33,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=16986.666666666668, ans=0.0
2023-12-20 18:49:44,732 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.049e-02
2023-12-20 18:49:59,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=17186.666666666668, ans=0.125
2023-12-20 18:50:06,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=17253.333333333332, ans=0.29613333333333347
2023-12-20 18:50:07,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=17253.333333333332, ans=0.007118840579710146
2023-12-20 18:50:15,466 INFO [train.py:886] (2/4) Epoch 50, batch 50, loss[loss=0.01167, audio_tagging_loss=0.01167, over 25000.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 1119154.56 frames. ], batch size: 100, lr: 1.17e-02, grad_scale: 32.0
2023-12-20 18:50:15,698 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=17320.0, ans=0.0
2023-12-20 18:50:18,099 INFO [train.py:1099] (2/4) Done!
|