File size: 137,947 Bytes
9530b1f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 |
2023-12-20 17:30:48,671 INFO [train.py:953] (1/4) Training started
2023-12-20 17:30:48,671 INFO [train.py:963] (1/4) Device: cuda:1
2023-12-20 17:30:48,671 INFO [train.py:965] (1/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '2b2ac14b326d61d79d04e53fbd69b1ff6d630411', 'k2-git-date': 'Thu Aug 24 05:58:26 2023', 'lhotse-version': '0.0.0+unknown.version', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'audio_tagging', 'icefall-git-sha1': 'bd01c212-clean', 'icefall-git-date': 'Tue Dec 19 17:20:49 2023', 'icefall-path': '/star-xy/softwares/icefall_development/icefall_audio_tagging', 'k2-path': '/star-xy/softwares/k2_development/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-xy/softwares/lhotse_development/lhotse_at/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-7-1218101249-5bcbfb5567-jsftr', 'IP address': '10.177.6.147'}, 'world_size': 4, 'master_port': 13455, 'tensorboard': True, 'num_epochs': 50, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('zipformer/exp_at_as_full'), 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'num_events': 527, 'audioset_subset': 'full', 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1000, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures'}
2023-12-20 17:30:48,672 INFO [train.py:967] (1/4) About to create model
2023-12-20 17:30:53,840 INFO [train.py:971] (1/4) Number of model parameters: 64264454
2023-12-20 17:30:56,725 INFO [train.py:986] (1/4) Using DDP
2023-12-20 17:30:57,442 INFO [at_datamodule.py:398] (1/4) About to get the audioset cuts for KD.
2023-12-20 17:30:57,498 INFO [at_datamodule.py:223] (1/4) Enable MUSAN
2023-12-20 17:30:57,498 INFO [at_datamodule.py:224] (1/4) About to get Musan cuts
2023-12-20 17:30:59,783 INFO [at_datamodule.py:248] (1/4) Enable SpecAugment
2023-12-20 17:30:59,783 INFO [at_datamodule.py:249] (1/4) Time warp factor: 80
2023-12-20 17:30:59,784 INFO [at_datamodule.py:259] (1/4) Num frame mask: 10
2023-12-20 17:30:59,784 INFO [at_datamodule.py:272] (1/4) About to create train dataset
2023-12-20 17:30:59,784 INFO [at_datamodule.py:299] (1/4) Using DynamicBucketingSampler.
2023-12-20 17:31:01,662 INFO [at_datamodule.py:315] (1/4) About to create train dataloader
2023-12-20 17:31:01,663 INFO [at_datamodule.py:410] (1/4) About to get test-other cuts
2023-12-20 17:31:01,664 INFO [at_datamodule.py:346] (1/4) About to create dev dataset
2023-12-20 17:31:02,110 INFO [at_datamodule.py:363] (1/4) About to create dev dataloader
2023-12-20 17:31:25,014 INFO [train.py:886] (1/4) Epoch 1, batch 0, loss[loss=1.835, audio_tagging_loss=1.835, over 24132.00 frames. ], tot_loss[loss=1.835, audio_tagging_loss=1.835, over 24132.00 frames. ], batch size: 100, lr: 2.25e-02, grad_scale: 2.0
2023-12-20 17:31:25,014 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:31:46,187 INFO [train.py:917] (1/4) Epoch 1, validation: loss=1.716, audio_tagging_loss=1.716, over 3737520.00 frames.
2023-12-20 17:31:46,188 INFO [train.py:918] (1/4) Maximum memory allocated so far is 13125MB
2023-12-20 17:31:50,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=0.0, ans=0.5
2023-12-20 17:31:50,928 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.45 vs. limit=7.5
2023-12-20 17:31:56,787 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.044e+02 8.568e+02 1.002e+03 1.369e+03 1.715e+03, threshold=4.006e+03, percent-clipped=0.0
2023-12-20 17:31:59,596 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=25.87 vs. limit=7.55
2023-12-20 17:32:01,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=66.66666666666667, ans=0.8976666666666667
2023-12-20 17:32:03,596 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=137.40 vs. limit=5.016666666666667
2023-12-20 17:32:06,824 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=338.86 vs. limit=5.033333333333333
2023-12-20 17:32:07,423 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.268e+01 3.256e+02 7.044e+02 1.161e+03 1.783e+03, threshold=2.818e+03, percent-clipped=0.0
2023-12-20 17:32:08,875 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=238.11 vs. limit=7.55
2023-12-20 17:32:10,371 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=231.09 vs. limit=7.6
2023-12-20 17:32:17,820 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=511.64 vs. limit=7.65
2023-12-20 17:32:20,719 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=382.71 vs. limit=7.65
2023-12-20 17:32:22,330 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=302.92 vs. limit=7.575
2023-12-20 17:32:28,032 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=129.21 vs. limit=4.08
2023-12-20 17:32:30,896 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.273e+01 1.290e+02 2.793e+02 8.337e+02 1.783e+03, threshold=1.117e+03, percent-clipped=0.0
2023-12-20 17:32:39,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=266.6666666666667, ans=3.04
2023-12-20 17:32:42,072 INFO [train.py:886] (1/4) Epoch 1, batch 50, loss[loss=0.05699, audio_tagging_loss=0.05699, over 25000.00 frames. ], tot_loss[loss=0.3011, audio_tagging_loss=0.3011, over 1123484.19 frames. ], batch size: 100, lr: 2.48e-02, grad_scale: 2.0
2023-12-20 17:33:00,531 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=135.29 vs. limit=7.76
2023-12-20 17:33:07,711 INFO [train.py:886] (1/4) Epoch 2, batch 0, loss[loss=0.05759, audio_tagging_loss=0.05759, over 25000.00 frames. ], tot_loss[loss=0.05759, audio_tagging_loss=0.05759, over 25000.00 frames. ], batch size: 100, lr: 2.44e-02, grad_scale: 4.0
2023-12-20 17:33:07,712 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:33:15,997 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1000, 5.3174, 4.9056, 5.2685], device='cuda:1')
2023-12-20 17:33:23,307 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.8108, 4.8136, 4.8126, 4.8180], device='cuda:1')
2023-12-20 17:33:28,177 INFO [train.py:917] (1/4) Epoch 2, validation: loss=0.0597, audio_tagging_loss=0.0597, over 3737520.00 frames.
2023-12-20 17:33:28,177 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14681MB
2023-12-20 17:33:39,386 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=95.67 vs. limit=5.1033333333333335
2023-12-20 17:33:42,739 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=77.45 vs. limit=4.165333333333333
2023-12-20 17:33:42,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=413.3333333333333, ans=7.81
2023-12-20 17:33:44,179 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=324.85 vs. limit=7.81
2023-12-20 17:33:51,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=480.0, ans=0.4775
2023-12-20 17:33:57,906 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=342.98 vs. limit=7.68
2023-12-20 17:34:01,195 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=299.82 vs. limit=7.86
2023-12-20 17:34:16,801 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=211.96 vs. limit=7.96
2023-12-20 17:34:24,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=613.3333333333334, ans=0.8785333333333334
2023-12-20 17:34:25,459 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.968e+01 6.154e+01 2.791e+02 2.019e+03, threshold=1.231e+02, percent-clipped=1.0
2023-12-20 17:34:26,069 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=258.13 vs. limit=8.01
2023-12-20 17:34:26,584 INFO [train.py:886] (1/4) Epoch 2, batch 50, loss[loss=0.0579, audio_tagging_loss=0.0579, over 25000.00 frames. ], tot_loss[loss=0.05954, audio_tagging_loss=0.05954, over 1113463.61 frames. ], batch size: 100, lr: 2.66e-02, grad_scale: 2.0
2023-12-20 17:34:26,778 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00
2023-12-20 17:34:26,869 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=205.70 vs. limit=7.755
2023-12-20 17:34:44,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=693.3333333333334, ans=0.8757333333333334
2023-12-20 17:34:44,415 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=298.12 vs. limit=7.76
2023-12-20 17:34:52,016 INFO [train.py:886] (1/4) Epoch 3, batch 0, loss[loss=0.06998, audio_tagging_loss=0.06998, over 21459.00 frames. ], tot_loss[loss=0.06998, audio_tagging_loss=0.06998, over 21459.00 frames. ], batch size: 106, lr: 2.54e-02, grad_scale: 4.0
2023-12-20 17:34:52,017 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:35:12,452 INFO [train.py:917] (1/4) Epoch 3, validation: loss=0.05878, audio_tagging_loss=0.05878, over 3737520.00 frames.
2023-12-20 17:35:12,453 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14681MB
2023-12-20 17:35:12,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=169.18 vs. limit=7.76
2023-12-20 17:35:13,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=693.3333333333334, ans=0.17400000000000002
2023-12-20 17:35:21,263 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=254.30 vs. limit=8.02
2023-12-20 17:35:24,614 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=105.70 vs. limit=5.38
2023-12-20 17:35:29,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=760.0, ans=5.475
2023-12-20 17:35:30,998 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=295.13 vs. limit=7.785
2023-12-20 17:35:33,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=760.0, ans=0.464375
2023-12-20 17:35:33,790 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=214.83 vs. limit=7.785
2023-12-20 17:35:33,979 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=33.33 vs. limit=7.785
2023-12-20 17:35:35,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=826.6666666666666, ans=0.8710666666666667
2023-12-20 17:35:40,425 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=218.09 vs. limit=5.413333333333333
2023-12-20 17:35:41,371 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00
2023-12-20 17:35:43,597 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=7.81
2023-12-20 17:35:44,440 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=327.16 vs. limit=8.12
2023-12-20 17:35:46,887 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=315.28 vs. limit=7.81
2023-12-20 17:35:49,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=893.3333333333334, ans=0.458125
2023-12-20 17:35:55,312 WARNING [optim.py:500] (1/4) Scaling gradients by 0.09217905253171921, model_norm_threshold=123.07855224609375
2023-12-20 17:35:55,459 WARNING [optim.py:572] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.7.weight with proportion 0.48, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.614e+05, grad_sumsq=6.752e+08, orig_rms_sq=1.276e-03
2023-12-20 17:36:01,863 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=21.13 vs. limit=7.86
2023-12-20 17:36:05,336 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.85 vs. limit=4.384
2023-12-20 17:36:09,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=960.0, ans=0.164
2023-12-20 17:36:11,141 INFO [train.py:886] (1/4) Epoch 3, batch 50, loss[loss=0.05275, audio_tagging_loss=0.05275, over 25000.00 frames. ], tot_loss[loss=0.05574, audio_tagging_loss=0.05574, over 1121484.75 frames. ], batch size: 100, lr: 2.75e-02, grad_scale: 4.0
2023-12-20 17:36:35,793 INFO [train.py:886] (1/4) Epoch 4, batch 0, loss[loss=0.05912, audio_tagging_loss=0.05912, over 25000.00 frames. ], tot_loss[loss=0.05912, audio_tagging_loss=0.05912, over 25000.00 frames. ], batch size: 100, lr: 2.58e-02, grad_scale: 8.0
2023-12-20 17:36:35,794 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:36:54,955 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.3721, 5.0210, 4.1473, 4.7986], device='cuda:1')
2023-12-20 17:36:55,851 INFO [train.py:917] (1/4) Epoch 4, validation: loss=0.05673, audio_tagging_loss=0.05673, over 3737520.00 frames.
2023-12-20 17:36:55,852 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14681MB
2023-12-20 17:37:11,044 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=123.07 vs. limit=5.553333333333334
2023-12-20 17:37:12,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.65 vs. limit=3.166
2023-12-20 17:37:23,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1173.3333333333333, ans=0.156
2023-12-20 17:37:25,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1173.3333333333333, ans=0.8589333333333333
2023-12-20 17:37:25,810 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=140.15 vs. limit=5.586666666666667
2023-12-20 17:37:33,883 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.57 vs. limit=4.496
2023-12-20 17:37:34,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1240.0, ans=0.28759999999999997
2023-12-20 17:37:35,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=195.92 vs. limit=7.965
2023-12-20 17:37:45,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1306.6666666666667, ans=0.151
2023-12-20 17:37:48,323 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=139.40 vs. limit=5.653333333333333
2023-12-20 17:37:49,921 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.093e+01 2.504e+01 2.720e+01 3.182e+01 1.335e+03, threshold=5.440e+01, percent-clipped=1.0
2023-12-20 17:37:52,701 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=338.24 vs. limit=7.99
2023-12-20 17:37:54,273 INFO [train.py:886] (1/4) Epoch 4, batch 50, loss[loss=0.05116, audio_tagging_loss=0.05116, over 25000.00 frames. ], tot_loss[loss=0.05483, audio_tagging_loss=0.05483, over 1117747.78 frames. ], batch size: 100, lr: 2.77e-02, grad_scale: 4.0
2023-12-20 17:38:12,343 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=153.26 vs. limit=8.54
2023-12-20 17:38:19,534 INFO [train.py:886] (1/4) Epoch 5, batch 0, loss[loss=0.05337, audio_tagging_loss=0.05337, over 24154.00 frames. ], tot_loss[loss=0.05337, audio_tagging_loss=0.05337, over 24154.00 frames. ], batch size: 100, lr: 2.59e-02, grad_scale: 8.0
2023-12-20 17:38:19,535 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:38:39,892 INFO [train.py:917] (1/4) Epoch 5, validation: loss=0.05523, audio_tagging_loss=0.05523, over 3737520.00 frames.
2023-12-20 17:38:39,893 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14681MB
2023-12-20 17:38:46,943 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.99 vs. limit=5.693333333333333
2023-12-20 17:38:48,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1386.6666666666667, ans=0.435
2023-12-20 17:38:52,034 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=24.62 vs. limit=8.045
2023-12-20 17:38:56,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1453.3333333333333, ans=0.28546666666666665
2023-12-20 17:38:58,677 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=150.98 vs. limit=8.59
2023-12-20 17:39:04,307 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=105.50 vs. limit=8.64
2023-12-20 17:39:08,132 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=36.14 vs. limit=4.304
2023-12-20 17:39:08,930 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=347.71 vs. limit=8.07
2023-12-20 17:39:10,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1520.0, ans=0.42875
2023-12-20 17:39:11,180 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=271.18 vs. limit=8.07
2023-12-20 17:39:11,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1520.0, ans=0.31
2023-12-20 17:39:14,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=63.68 vs. limit=8.07
2023-12-20 17:39:16,910 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=350.03 vs. limit=8.095
2023-12-20 17:39:20,378 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=91.05 vs. limit=8.095
2023-12-20 17:39:26,898 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=94.86 vs. limit=8.12
2023-12-20 17:39:28,326 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.39 vs. limit=8.74
2023-12-20 17:39:29,282 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=80.44 vs. limit=8.12
2023-12-20 17:39:32,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=329.69 vs. limit=8.12
2023-12-20 17:39:34,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1653.3333333333333, ans=0.044833333333333336
2023-12-20 17:39:38,884 INFO [train.py:886] (1/4) Epoch 5, batch 50, loss[loss=0.04927, audio_tagging_loss=0.04927, over 25000.00 frames. ], tot_loss[loss=0.05231, audio_tagging_loss=0.05231, over 1126559.95 frames. ], batch size: 100, lr: 2.77e-02, grad_scale: 8.0
2023-12-20 17:40:04,929 INFO [train.py:886] (1/4) Epoch 6, batch 0, loss[loss=0.05394, audio_tagging_loss=0.05394, over 24115.00 frames. ], tot_loss[loss=0.05394, audio_tagging_loss=0.05394, over 24115.00 frames. ], batch size: 100, lr: 2.59e-02, grad_scale: 16.0
2023-12-20 17:40:04,929 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:40:21,308 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.1342, 4.5939, 4.2046, 3.9363], device='cuda:1')
2023-12-20 17:40:25,825 INFO [train.py:917] (1/4) Epoch 6, validation: loss=0.05425, audio_tagging_loss=0.05425, over 3737520.00 frames.
2023-12-20 17:40:25,826 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14782MB
2023-12-20 17:40:28,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=1733.3333333333333, ans=0.08916666666666667
2023-12-20 17:40:33,547 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=16.42 vs. limit=5.433333333333334
2023-12-20 17:40:35,729 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=14.99 vs. limit=4.693333333333333
2023-12-20 17:40:37,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1800.0, ans=0.415625
2023-12-20 17:40:39,323 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=12.79 vs. limit=4.72
2023-12-20 17:40:51,841 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=20.00 vs. limit=5.466666666666667
2023-12-20 17:40:53,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1866.6666666666667, ans=0.4125
2023-12-20 17:40:53,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1866.6666666666667, ans=0.8346666666666667
2023-12-20 17:41:00,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1933.3333333333333, ans=0.18211666666666668
2023-12-20 17:41:02,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=20.02 vs. limit=5.483333333333333
2023-12-20 17:41:06,639 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=184.24 vs. limit=8.225
2023-12-20 17:41:07,665 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=4.773333333333333
2023-12-20 17:41:08,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1933.3333333333333, ans=0.1275
2023-12-20 17:41:08,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=109.20 vs. limit=8.95
2023-12-20 17:41:09,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1933.3333333333333, ans=0.8323333333333334
2023-12-20 17:41:11,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2000.0, ans=0.40625
2023-12-20 17:41:14,959 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 2.556e+01 2.831e+01 3.472e+01 7.747e+01, threshold=5.662e+01, percent-clipped=6.0
2023-12-20 17:41:19,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2000.0, ans=0.055
2023-12-20 17:41:19,985 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=220.80 vs. limit=8.25
2023-12-20 17:41:22,217 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=79.27 vs. limit=6.0
2023-12-20 17:41:23,849 INFO [train.py:886] (1/4) Epoch 6, batch 50, loss[loss=0.04985, audio_tagging_loss=0.04985, over 25000.00 frames. ], tot_loss[loss=0.05205, audio_tagging_loss=0.05205, over 1119666.96 frames. ], batch size: 100, lr: 2.76e-02, grad_scale: 16.0
2023-12-20 17:41:24,153 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=26.65 vs. limit=4.826666666666666
2023-12-20 17:41:49,203 INFO [train.py:886] (1/4) Epoch 7, batch 0, loss[loss=0.06588, audio_tagging_loss=0.06588, over 20889.00 frames. ], tot_loss[loss=0.06588, audio_tagging_loss=0.06588, over 20889.00 frames. ], batch size: 106, lr: 2.60e-02, grad_scale: 32.0
2023-12-20 17:41:49,203 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:42:09,828 INFO [train.py:917] (1/4) Epoch 7, validation: loss=0.05269, audio_tagging_loss=0.05269, over 3737520.00 frames.
2023-12-20 17:42:09,828 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 17:42:11,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2080.0, ans=0.40249999999999997
2023-12-20 17:42:13,770 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=40.80 vs. limit=8.28
2023-12-20 17:42:13,879 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=208.37 vs. limit=8.28
2023-12-20 17:42:14,138 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=39.95 vs. limit=5.0
2023-12-20 17:42:27,238 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.39 vs. limit=8.305
2023-12-20 17:42:32,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=168.62 vs. limit=9.11
2023-12-20 17:42:45,072 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=36.42 vs. limit=6.14
2023-12-20 17:42:55,722 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=165.89 vs. limit=8.38
2023-12-20 17:42:57,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2346.6666666666665, ans=0.20666666666666667
2023-12-20 17:42:59,278 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=12.13 vs. limit=5.586666666666667
2023-12-20 17:43:00,064 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=103.45 vs. limit=9.26
2023-12-20 17:43:01,148 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=106.55 vs. limit=9.26
2023-12-20 17:43:06,878 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=39.82 vs. limit=8.405
2023-12-20 17:43:07,586 INFO [train.py:886] (1/4) Epoch 7, batch 50, loss[loss=0.04806, audio_tagging_loss=0.04806, over 25000.00 frames. ], tot_loss[loss=0.05213, audio_tagging_loss=0.05213, over 1118761.40 frames. ], batch size: 100, lr: 2.76e-02, grad_scale: 1.0
2023-12-20 17:43:07,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2413.3333333333335, ans=0.38687499999999997
2023-12-20 17:43:32,846 INFO [train.py:886] (1/4) Epoch 8, batch 0, loss[loss=0.05969, audio_tagging_loss=0.05969, over 21042.00 frames. ], tot_loss[loss=0.05969, audio_tagging_loss=0.05969, over 21042.00 frames. ], batch size: 106, lr: 2.60e-02, grad_scale: 2.0
2023-12-20 17:43:32,847 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:43:47,919 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.8688, 4.3972, 3.5764, 3.4130], device='cuda:1')
2023-12-20 17:43:53,652 INFO [train.py:917] (1/4) Epoch 8, validation: loss=0.05155, audio_tagging_loss=0.05155, over 3737520.00 frames.
2023-12-20 17:43:53,653 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 17:43:55,415 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=59.31 vs. limit=9.32
2023-12-20 17:43:58,165 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=175.19 vs. limit=8.41
2023-12-20 17:44:04,222 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=41.73 vs. limit=8.41
2023-12-20 17:44:12,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=3.374
2023-12-20 17:44:22,982 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=33.31 vs. limit=9.42
2023-12-20 17:44:23,770 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=135.86 vs. limit=8.46
2023-12-20 17:44:23,818 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=26.90 vs. limit=6.28
2023-12-20 17:44:36,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2626.6666666666665, ans=0.10149999999999999
2023-12-20 17:44:39,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.14 vs. limit=5.673333333333334
2023-12-20 17:44:42,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=2693.3333333333335, ans=0.22306666666666666
2023-12-20 17:44:42,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=2693.3333333333335, ans=8.51
2023-12-20 17:44:43,338 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.635e+01 3.487e+01 4.265e+01 5.657e+01 4.687e+02, threshold=8.530e+01, percent-clipped=24.0
2023-12-20 17:44:51,056 INFO [train.py:886] (1/4) Epoch 8, batch 50, loss[loss=0.04785, audio_tagging_loss=0.04785, over 25000.00 frames. ], tot_loss[loss=0.04955, audio_tagging_loss=0.04955, over 1114180.34 frames. ], batch size: 100, lr: 2.75e-02, grad_scale: 2.0
2023-12-20 17:44:51,689 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=110.51 vs. limit=9.57
2023-12-20 17:45:09,413 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.02 vs. limit=6.386666666666667
2023-12-20 17:45:16,350 INFO [train.py:886] (1/4) Epoch 9, batch 0, loss[loss=0.05045, audio_tagging_loss=0.05045, over 24090.00 frames. ], tot_loss[loss=0.05045, audio_tagging_loss=0.05045, over 24090.00 frames. ], batch size: 100, lr: 2.61e-02, grad_scale: 4.0
2023-12-20 17:45:16,350 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:45:37,429 INFO [train.py:917] (1/4) Epoch 9, validation: loss=0.04977, audio_tagging_loss=0.04977, over 3737520.00 frames.
2023-12-20 17:45:37,429 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 17:45:37,785 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=90.75 vs. limit=9.58
2023-12-20 17:45:38,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=40.18 vs. limit=8.54
2023-12-20 17:45:53,202 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=86.22 vs. limit=9.629999999999999
2023-12-20 17:46:04,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2906.6666666666665, ans=0.091
2023-12-20 17:46:12,997 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=29.26 vs. limit=9.73
2023-12-20 17:46:15,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2973.3333333333335, ans=0.08849999999999998
2023-12-20 17:46:21,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=74.98 vs. limit=8.64
2023-12-20 17:46:24,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=3040.0, ans=0.076
2023-12-20 17:46:25,870 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=4.656e+00
2023-12-20 17:46:32,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=3106.6666666666665, ans=0.08058333333333334
2023-12-20 17:46:33,268 INFO [train.py:886] (1/4) Epoch 9, batch 50, loss[loss=0.04912, audio_tagging_loss=0.04912, over 25000.00 frames. ], tot_loss[loss=0.04777, audio_tagging_loss=0.04777, over 1119621.40 frames. ], batch size: 100, lr: 2.75e-02, grad_scale: 4.0
2023-12-20 17:46:52,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.66 vs. limit=5.78
2023-12-20 17:46:59,486 INFO [train.py:886] (1/4) Epoch 10, batch 0, loss[loss=0.04918, audio_tagging_loss=0.04918, over 25000.00 frames. ], tot_loss[loss=0.04918, audio_tagging_loss=0.04918, over 25000.00 frames. ], batch size: 100, lr: 2.62e-02, grad_scale: 8.0
2023-12-20 17:46:59,487 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:47:20,694 INFO [train.py:917] (1/4) Epoch 10, validation: loss=0.04858, audio_tagging_loss=0.04858, over 3737520.00 frames.
2023-12-20 17:47:20,695 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 17:47:21,188 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=17.67 vs. limit=6.5600000000000005
2023-12-20 17:47:21,355 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=102.46 vs. limit=8.67
2023-12-20 17:47:31,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3186.6666666666665, ans=0.21813333333333335
2023-12-20 17:47:31,879 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.60 vs. limit=6.593333333333334
2023-12-20 17:47:36,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3186.6666666666665, ans=0.35062499999999996
2023-12-20 17:47:36,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3186.6666666666665, ans=0.35062499999999996
2023-12-20 17:47:47,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3253.3333333333335, ans=0.26746666666666663
2023-12-20 17:47:51,734 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.78 vs. limit=8.72
2023-12-20 17:47:53,535 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.99 vs. limit=5.328
2023-12-20 17:47:56,895 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=22.63 vs. limit=8.745000000000001
2023-12-20 17:48:00,363 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.79 vs. limit=5.83
2023-12-20 17:48:00,497 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=35.16 vs. limit=9.99
2023-12-20 17:48:01,451 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=59.47 vs. limit=8.745000000000001
2023-12-20 17:48:02,275 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.056e+00
2023-12-20 17:48:04,134 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.619e+01 3.726e+01 4.484e+01 5.424e+01 1.858e+02, threshold=8.969e+01, percent-clipped=3.0
2023-12-20 17:48:06,734 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.24 vs. limit=5.354666666666667
2023-12-20 17:48:06,939 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.02 vs. limit=10.04
2023-12-20 17:48:12,408 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=35.68 vs. limit=10.04
2023-12-20 17:48:15,991 INFO [train.py:886] (1/4) Epoch 10, batch 50, loss[loss=0.04728, audio_tagging_loss=0.04728, over 25000.00 frames. ], tot_loss[loss=0.04679, audio_tagging_loss=0.04679, over 1116564.21 frames. ], batch size: 100, lr: 2.71e-02, grad_scale: 8.0
2023-12-20 17:48:40,825 INFO [train.py:886] (1/4) Epoch 11, batch 0, loss[loss=0.05327, audio_tagging_loss=0.05327, over 21049.00 frames. ], tot_loss[loss=0.05327, audio_tagging_loss=0.05327, over 21049.00 frames. ], batch size: 106, lr: 2.58e-02, grad_scale: 16.0
2023-12-20 17:48:40,825 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:48:53,030 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.4948, 2.5191, 2.8846, 2.5371], device='cuda:1')
2023-12-20 17:49:01,997 INFO [train.py:917] (1/4) Epoch 11, validation: loss=0.04728, audio_tagging_loss=0.04728, over 3737520.00 frames.
2023-12-20 17:49:01,998 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 17:49:06,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3466.6666666666665, ans=7.166666666666666
2023-12-20 17:49:22,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3533.3333333333335, ans=0.334375
2023-12-20 17:49:23,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3533.3333333333335, ans=0.334375
2023-12-20 17:49:25,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=3600.0, ans=10.2
2023-12-20 17:49:26,241 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=35.67 vs. limit=8.85
2023-12-20 17:49:26,385 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=16.88 vs. limit=8.85
2023-12-20 17:49:30,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3600.0, ans=0.06499999999999997
2023-12-20 17:49:30,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.27 vs. limit=8.85
2023-12-20 17:49:31,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3600.0, ans=0.33125
2023-12-20 17:49:32,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3600.0, ans=0.33125
2023-12-20 17:49:36,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3666.6666666666665, ans=0.328125
2023-12-20 17:49:38,217 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=43.03 vs. limit=8.875
2023-12-20 17:49:38,530 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.55 vs. limit=10.25
2023-12-20 17:49:51,441 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=28.44 vs. limit=6.866666666666667
2023-12-20 17:49:53,292 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.51 vs. limit=10.3
2023-12-20 17:49:56,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3733.3333333333335, ans=0.07
2023-12-20 17:49:58,310 INFO [train.py:886] (1/4) Epoch 11, batch 50, loss[loss=0.04305, audio_tagging_loss=0.04305, over 25000.00 frames. ], tot_loss[loss=0.04557, audio_tagging_loss=0.04557, over 1113738.85 frames. ], batch size: 100, lr: 2.58e-02, grad_scale: 16.0
2023-12-20 17:50:23,184 INFO [train.py:886] (1/4) Epoch 12, batch 0, loss[loss=0.04627, audio_tagging_loss=0.04627, over 25000.00 frames. ], tot_loss[loss=0.04627, audio_tagging_loss=0.04627, over 25000.00 frames. ], batch size: 100, lr: 2.47e-02, grad_scale: 32.0
2023-12-20 17:50:23,184 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:50:36,844 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([1.3127, 1.5376, 1.3518, 1.2430], device='cuda:1')
2023-12-20 17:50:44,480 INFO [train.py:917] (1/4) Epoch 12, validation: loss=0.04619, audio_tagging_loss=0.04619, over 3737520.00 frames.
2023-12-20 17:50:44,480 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 17:50:47,967 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=26.68 vs. limit=8.93
2023-12-20 17:50:48,087 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=15.31 vs. limit=8.93
2023-12-20 17:50:52,781 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.07 vs. limit=8.93
2023-12-20 17:50:58,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3880.0, ans=0.318125
2023-12-20 17:50:59,331 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.03 vs. limit=8.955
2023-12-20 17:51:07,900 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.49 vs. limit=6.973333333333333
2023-12-20 17:51:12,494 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.09 vs. limit=10.46
2023-12-20 17:51:14,746 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=8.98
2023-12-20 17:51:24,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4013.3333333333335, ans=0.311875
2023-12-20 17:51:24,772 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=35.79 vs. limit=9.004999999999999
2023-12-20 17:51:25,063 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.695e+01 3.849e+01 4.841e+01 5.572e+01 8.770e+01, threshold=9.682e+01, percent-clipped=0.0
2023-12-20 17:51:26,512 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=27.31 vs. limit=10.51
2023-12-20 17:51:29,706 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.53 vs. limit=7.04
2023-12-20 17:51:29,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.23 vs. limit=7.04
2023-12-20 17:51:35,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4080.0, ans=0.04966666666666667
2023-12-20 17:51:39,162 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.97 vs. limit=5.632
2023-12-20 17:51:40,930 INFO [train.py:886] (1/4) Epoch 12, batch 50, loss[loss=0.04324, audio_tagging_loss=0.04324, over 25000.00 frames. ], tot_loss[loss=0.04382, audio_tagging_loss=0.04382, over 1123444.17 frames. ], batch size: 100, lr: 2.47e-02, grad_scale: 32.0
2023-12-20 17:52:04,705 INFO [train.py:886] (1/4) Epoch 13, batch 0, loss[loss=0.04383, audio_tagging_loss=0.04383, over 24057.00 frames. ], tot_loss[loss=0.04383, audio_tagging_loss=0.04383, over 24057.00 frames. ], batch size: 100, lr: 2.38e-02, grad_scale: 32.0
2023-12-20 17:52:04,706 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:52:12,860 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.6706, 1.7295, 1.9306, 1.5983], device='cuda:1')
2023-12-20 17:52:25,609 INFO [train.py:917] (1/4) Epoch 13, validation: loss=0.04525, audio_tagging_loss=0.04525, over 3737520.00 frames.
2023-12-20 17:52:25,610 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 17:52:25,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4160.0, ans=0.04933333333333333
2023-12-20 17:52:36,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4226.666666666667, ans=0.2577333333333333
2023-12-20 17:52:40,476 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.20 vs. limit=9.085
2023-12-20 17:52:52,086 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=26.59 vs. limit=10.719999999999999
2023-12-20 17:52:52,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.53 vs. limit=10.719999999999999
2023-12-20 17:52:58,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4360.0, ans=0.295625
2023-12-20 17:52:58,560 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=48.43 vs. limit=9.135
2023-12-20 17:53:00,497 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.133e+01
2023-12-20 17:53:01,697 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=46.56 vs. limit=9.135
2023-12-20 17:53:04,120 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.16 vs. limit=9.135
2023-12-20 17:53:14,723 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.78 vs. limit=6.1066666666666665
2023-12-20 17:53:15,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.36 vs. limit=7.213333333333333
2023-12-20 17:53:19,087 INFO [train.py:886] (1/4) Epoch 13, batch 50, loss[loss=0.04217, audio_tagging_loss=0.04217, over 25000.00 frames. ], tot_loss[loss=0.04317, audio_tagging_loss=0.04317, over 1118034.72 frames. ], batch size: 100, lr: 2.38e-02, grad_scale: 32.0
2023-12-20 17:53:43,851 INFO [train.py:886] (1/4) Epoch 14, batch 0, loss[loss=0.04432, audio_tagging_loss=0.04432, over 24126.00 frames. ], tot_loss[loss=0.04432, audio_tagging_loss=0.04432, over 24126.00 frames. ], batch size: 100, lr: 2.29e-02, grad_scale: 32.0
2023-12-20 17:53:43,852 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:53:54,752 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.5347, 2.3677, 2.5217, 2.6816], device='cuda:1')
2023-12-20 17:54:05,169 INFO [train.py:917] (1/4) Epoch 14, validation: loss=0.04503, audio_tagging_loss=0.04503, over 3737520.00 frames.
2023-12-20 17:54:05,170 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 17:54:10,966 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.86 vs. limit=9.19
2023-12-20 17:54:35,814 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.59 vs. limit=10.98
2023-12-20 17:54:37,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4706.666666666667, ans=0.27937500000000004
2023-12-20 17:54:37,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.91 vs. limit=9.265
2023-12-20 17:54:37,831 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.33 vs. limit=9.265
2023-12-20 17:54:38,363 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.429e+01 4.195e+01 5.214e+01 6.348e+01 1.962e+02, threshold=1.043e+02, percent-clipped=5.0
2023-12-20 17:54:40,826 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.05 vs. limit=9.265
2023-12-20 17:54:57,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4840.0, ans=0.273125
2023-12-20 17:54:58,024 INFO [train.py:886] (1/4) Epoch 14, batch 50, loss[loss=0.04078, audio_tagging_loss=0.04078, over 25000.00 frames. ], tot_loss[loss=0.04263, audio_tagging_loss=0.04263, over 1119562.76 frames. ], batch size: 100, lr: 2.29e-02, grad_scale: 32.0
2023-12-20 17:55:22,493 INFO [train.py:886] (1/4) Epoch 15, batch 0, loss[loss=0.04204, audio_tagging_loss=0.04204, over 25000.00 frames. ], tot_loss[loss=0.04204, audio_tagging_loss=0.04204, over 25000.00 frames. ], batch size: 100, lr: 2.21e-02, grad_scale: 32.0
2023-12-20 17:55:22,494 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:55:43,373 INFO [train.py:917] (1/4) Epoch 15, validation: loss=0.04452, audio_tagging_loss=0.04452, over 3737520.00 frames.
2023-12-20 17:55:43,373 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 17:55:44,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=4853.333333333333, ans=0.20146666666666668
2023-12-20 17:55:44,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=4853.333333333333, ans=9.32
2023-12-20 17:55:47,821 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=34.74 vs. limit=9.32
2023-12-20 17:55:49,874 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=25.22 vs. limit=9.32
2023-12-20 17:55:56,125 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=9.345
2023-12-20 17:56:01,396 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.77 vs. limit=11.19
2023-12-20 17:56:11,479 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.36 vs. limit=6.246666666666667
2023-12-20 17:56:17,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5053.333333333333, ans=0.263125
2023-12-20 17:56:17,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.22 vs. limit=9.395
2023-12-20 17:56:23,758 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.53 vs. limit=9.395
2023-12-20 17:56:27,639 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.73 vs. limit=9.42
2023-12-20 17:56:31,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=5120.0, ans=0.09899494936611666
2023-12-20 17:56:35,411 INFO [train.py:886] (1/4) Epoch 15, batch 50, loss[loss=0.0422, audio_tagging_loss=0.0422, over 25000.00 frames. ], tot_loss[loss=0.04144, audio_tagging_loss=0.04144, over 1124893.53 frames. ], batch size: 100, lr: 2.21e-02, grad_scale: 32.0
2023-12-20 17:57:00,247 INFO [train.py:886] (1/4) Epoch 16, batch 0, loss[loss=0.04654, audio_tagging_loss=0.04654, over 22112.00 frames. ], tot_loss[loss=0.04654, audio_tagging_loss=0.04654, over 22112.00 frames. ], batch size: 106, lr: 2.14e-02, grad_scale: 32.0
2023-12-20 17:57:00,248 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:57:13,238 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.9660, 1.7526, 2.0088, 1.8561], device='cuda:1')
2023-12-20 17:57:21,259 INFO [train.py:917] (1/4) Epoch 16, validation: loss=0.04383, audio_tagging_loss=0.04383, over 3737520.00 frames.
2023-12-20 17:57:21,259 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 17:57:26,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5200.0, ans=0.25625
2023-12-20 17:57:27,474 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.08 vs. limit=11.4
2023-12-20 17:57:28,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=5200.0, ans=0.278
2023-12-20 17:57:29,368 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=53.44 vs. limit=9.45
2023-12-20 17:57:33,591 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=16.84 vs. limit=9.475
2023-12-20 17:57:38,819 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.68 vs. limit=11.45
2023-12-20 17:57:42,137 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.31 vs. limit=11.5
2023-12-20 17:57:43,088 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.87 vs. limit=9.5
2023-12-20 17:57:47,412 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=11.5
2023-12-20 17:57:49,876 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.797e+01 3.933e+01 4.813e+01 5.766e+01 2.623e+02, threshold=9.626e+01, percent-clipped=4.0
2023-12-20 17:57:51,381 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.04 vs. limit=9.5
2023-12-20 17:58:06,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5466.666666666667, ans=0.24375000000000002
2023-12-20 17:58:12,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=5466.666666666667, ans=0.24375000000000002
2023-12-20 17:58:14,014 INFO [train.py:886] (1/4) Epoch 16, batch 50, loss[loss=0.03823, audio_tagging_loss=0.03823, over 25000.00 frames. ], tot_loss[loss=0.04063, audio_tagging_loss=0.04063, over 1120529.57 frames. ], batch size: 100, lr: 2.14e-02, grad_scale: 32.0
2023-12-20 17:58:38,071 INFO [train.py:886] (1/4) Epoch 17, batch 0, loss[loss=0.0434, audio_tagging_loss=0.0434, over 24114.00 frames. ], tot_loss[loss=0.0434, audio_tagging_loss=0.0434, over 24114.00 frames. ], batch size: 100, lr: 2.07e-02, grad_scale: 32.0
2023-12-20 17:58:38,072 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:58:46,293 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([1.8670, 1.1711, 1.7423, 1.7265], device='cuda:1')
2023-12-20 17:58:59,165 INFO [train.py:917] (1/4) Epoch 17, validation: loss=0.04362, audio_tagging_loss=0.04362, over 3737520.00 frames.
2023-12-20 17:58:59,166 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 17:59:10,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5613.333333333333, ans=0.236875
2023-12-20 17:59:12,955 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.75 vs. limit=11.71
2023-12-20 17:59:17,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5613.333333333333, ans=0.0
2023-12-20 17:59:17,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=5613.333333333333, ans=0.8061333333333334
2023-12-20 17:59:18,905 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.49 vs. limit=11.76
2023-12-20 17:59:35,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5746.666666666667, ans=0.0
2023-12-20 17:59:49,922 INFO [train.py:886] (1/4) Epoch 17, batch 50, loss[loss=0.03733, audio_tagging_loss=0.03733, over 25000.00 frames. ], tot_loss[loss=0.0399, audio_tagging_loss=0.0399, over 1120885.05 frames. ], batch size: 100, lr: 2.07e-02, grad_scale: 32.0
2023-12-20 18:00:14,303 INFO [train.py:886] (1/4) Epoch 18, batch 0, loss[loss=0.04102, audio_tagging_loss=0.04102, over 24118.00 frames. ], tot_loss[loss=0.04102, audio_tagging_loss=0.04102, over 24118.00 frames. ], batch size: 100, lr: 2.01e-02, grad_scale: 32.0
2023-12-20 18:00:14,303 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:00:35,062 INFO [train.py:917] (1/4) Epoch 18, validation: loss=0.04342, audio_tagging_loss=0.04342, over 3737520.00 frames.
2023-12-20 18:00:35,063 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:00:45,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5960.0, ans=0.2404
2023-12-20 18:00:49,186 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.65 vs. limit=9.735
2023-12-20 18:00:53,272 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.76 vs. limit=7.98
2023-12-20 18:00:58,723 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.069e+01 3.667e+01 4.319e+01 5.687e+01 1.553e+02, threshold=8.639e+01, percent-clipped=3.0
2023-12-20 18:01:00,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=6026.666666666667, ans=0.23973333333333333
2023-12-20 18:01:01,367 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=12.02
2023-12-20 18:01:08,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=6093.333333333333, ans=0.23906666666666665
2023-12-20 18:01:18,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=6160.0, ans=0.21125
2023-12-20 18:01:18,633 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.66 vs. limit=9.81
2023-12-20 18:01:21,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=6160.0, ans=0.041
2023-12-20 18:01:25,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.69 vs. limit=6.490666666666667
2023-12-20 18:01:25,748 INFO [train.py:886] (1/4) Epoch 18, batch 50, loss[loss=0.03687, audio_tagging_loss=0.03687, over 25000.00 frames. ], tot_loss[loss=0.03919, audio_tagging_loss=0.03919, over 1123284.48 frames. ], batch size: 100, lr: 2.01e-02, grad_scale: 32.0
2023-12-20 18:01:50,820 INFO [train.py:886] (1/4) Epoch 19, batch 0, loss[loss=0.05174, audio_tagging_loss=0.05174, over 20735.00 frames. ], tot_loss[loss=0.05174, audio_tagging_loss=0.05174, over 20735.00 frames. ], batch size: 106, lr: 1.96e-02, grad_scale: 32.0
2023-12-20 18:01:50,821 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:02:11,830 INFO [train.py:917] (1/4) Epoch 19, validation: loss=0.04287, audio_tagging_loss=0.04287, over 3737520.00 frames.
2023-12-20 18:02:11,831 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:02:12,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=6240.0, ans=0.20750000000000002
2023-12-20 18:02:28,693 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=22.02 vs. limit=9.865
2023-12-20 18:02:29,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.82 vs. limit=5.261333333333333
2023-12-20 18:02:41,818 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.24 vs. limit=9.915
2023-12-20 18:02:58,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=6506.666666666667, ans=0.195
2023-12-20 18:03:02,034 INFO [train.py:886] (1/4) Epoch 19, batch 50, loss[loss=0.0379, audio_tagging_loss=0.0379, over 25000.00 frames. ], tot_loss[loss=0.03867, audio_tagging_loss=0.03867, over 1113120.79 frames. ], batch size: 100, lr: 1.96e-02, grad_scale: 32.0
2023-12-20 18:03:26,286 INFO [train.py:886] (1/4) Epoch 20, batch 0, loss[loss=0.04763, audio_tagging_loss=0.04763, over 20688.00 frames. ], tot_loss[loss=0.04763, audio_tagging_loss=0.04763, over 20688.00 frames. ], batch size: 106, lr: 1.91e-02, grad_scale: 32.0
2023-12-20 18:03:26,287 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:03:47,096 INFO [train.py:917] (1/4) Epoch 20, validation: loss=0.0429, audio_tagging_loss=0.0429, over 3737520.00 frames.
2023-12-20 18:03:47,097 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:03:59,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=6653.333333333333, ans=0.188125
2023-12-20 18:04:03,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.63 vs. limit=12.49
2023-12-20 18:04:04,133 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.75 vs. limit=9.995000000000001
2023-12-20 18:04:06,505 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.851e+01 3.799e+01 4.551e+01 5.624e+01 1.513e+02, threshold=9.102e+01, percent-clipped=5.0
2023-12-20 18:04:19,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=6786.666666666667, ans=0.181875
2023-12-20 18:04:23,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=6786.666666666667, ans=0.6624666666666666
2023-12-20 18:04:28,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=6853.333333333333, ans=0.17875000000000002
2023-12-20 18:04:31,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=6853.333333333333, ans=0.6601333333333333
2023-12-20 18:04:34,812 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.12 vs. limit=10.07
2023-12-20 18:04:37,015 INFO [train.py:886] (1/4) Epoch 20, batch 50, loss[loss=0.03946, audio_tagging_loss=0.03946, over 25000.00 frames. ], tot_loss[loss=0.03792, audio_tagging_loss=0.03792, over 1119689.44 frames. ], batch size: 100, lr: 1.91e-02, grad_scale: 32.0
2023-12-20 18:04:59,861 INFO [train.py:886] (1/4) Epoch 21, batch 0, loss[loss=0.03549, audio_tagging_loss=0.03549, over 25000.00 frames. ], tot_loss[loss=0.03549, audio_tagging_loss=0.03549, over 25000.00 frames. ], batch size: 100, lr: 1.86e-02, grad_scale: 32.0
2023-12-20 18:04:59,862 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:05:20,819 INFO [train.py:917] (1/4) Epoch 21, validation: loss=0.0427, audio_tagging_loss=0.0427, over 3737520.00 frames.
2023-12-20 18:05:20,819 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:05:31,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=7000.0, ans=0.0
2023-12-20 18:05:35,939 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=29.45 vs. limit=12.75
2023-12-20 18:05:36,009 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.63 vs. limit=10.125
2023-12-20 18:05:50,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=7066.666666666667, ans=9.416666666666668
2023-12-20 18:06:10,335 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.10 vs. limit=12.95
2023-12-20 18:06:10,787 INFO [train.py:886] (1/4) Epoch 21, batch 50, loss[loss=0.03565, audio_tagging_loss=0.03565, over 25000.00 frames. ], tot_loss[loss=0.03737, audio_tagging_loss=0.03737, over 1116419.59 frames. ], batch size: 100, lr: 1.86e-02, grad_scale: 32.0
2023-12-20 18:06:34,955 INFO [train.py:886] (1/4) Epoch 22, batch 0, loss[loss=0.04177, audio_tagging_loss=0.04177, over 24182.00 frames. ], tot_loss[loss=0.04177, audio_tagging_loss=0.04177, over 24182.00 frames. ], batch size: 100, lr: 1.82e-02, grad_scale: 32.0
2023-12-20 18:06:34,956 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:06:48,027 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.1941, 2.0806, 2.3632, 2.1985], device='cuda:1')
2023-12-20 18:06:55,957 INFO [train.py:917] (1/4) Epoch 22, validation: loss=0.04259, audio_tagging_loss=0.04259, over 3737520.00 frames.
2023-12-20 18:06:55,958 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:06:58,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=7280.0, ans=0.036333333333333336
2023-12-20 18:06:59,894 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.35 vs. limit=12.96
2023-12-20 18:07:01,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=7280.0, ans=0.15875
2023-12-20 18:07:09,509 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.10 vs. limit=10.254999999999999
2023-12-20 18:07:10,812 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.833e+01 3.757e+01 4.513e+01 5.428e+01 2.125e+02, threshold=9.026e+01, percent-clipped=5.0
2023-12-20 18:07:31,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=7480.0, ans=0.0
2023-12-20 18:07:31,752 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.16 vs. limit=10.305
2023-12-20 18:07:37,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=7546.666666666667, ans=0.035222222222222224
2023-12-20 18:07:44,532 INFO [train.py:886] (1/4) Epoch 22, batch 50, loss[loss=0.03513, audio_tagging_loss=0.03513, over 25000.00 frames. ], tot_loss[loss=0.03668, audio_tagging_loss=0.03668, over 1115831.91 frames. ], batch size: 100, lr: 1.81e-02, grad_scale: 32.0
2023-12-20 18:07:44,914 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.49 vs. limit=7.045333333333334
2023-12-20 18:08:02,878 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=13.219999999999999
2023-12-20 18:08:08,666 INFO [train.py:886] (1/4) Epoch 23, batch 0, loss[loss=0.0415, audio_tagging_loss=0.0415, over 21567.00 frames. ], tot_loss[loss=0.0415, audio_tagging_loss=0.0415, over 21567.00 frames. ], batch size: 106, lr: 1.77e-02, grad_scale: 32.0
2023-12-20 18:08:08,666 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:08:30,060 INFO [train.py:917] (1/4) Epoch 23, validation: loss=0.04291, audio_tagging_loss=0.04291, over 3737520.00 frames.
2023-12-20 18:08:30,061 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:08:31,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=7626.666666666667, ans=10.36
2023-12-20 18:08:32,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=7626.666666666667, ans=0.14250000000000002
2023-12-20 18:08:46,661 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.40 vs. limit=13.27
2023-12-20 18:08:47,668 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.52 vs. limit=10.385
2023-12-20 18:08:57,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=7760.0, ans=0.13624999999999998
2023-12-20 18:08:57,850 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.79 vs. limit=10.41
2023-12-20 18:09:17,953 INFO [train.py:886] (1/4) Epoch 23, batch 50, loss[loss=0.03463, audio_tagging_loss=0.03463, over 25000.00 frames. ], tot_loss[loss=0.03506, audio_tagging_loss=0.03506, over 1123814.02 frames. ], batch size: 100, lr: 1.77e-02, grad_scale: 32.0
2023-12-20 18:09:40,309 INFO [train.py:886] (1/4) Epoch 24, batch 0, loss[loss=0.04475, audio_tagging_loss=0.04475, over 21395.00 frames. ], tot_loss[loss=0.04475, audio_tagging_loss=0.04475, over 21395.00 frames. ], batch size: 106, lr: 1.73e-02, grad_scale: 32.0
2023-12-20 18:09:40,310 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:10:01,279 INFO [train.py:917] (1/4) Epoch 24, validation: loss=0.04248, audio_tagging_loss=0.04248, over 3737520.00 frames.
2023-12-20 18:10:01,280 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:10:04,232 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=1.052e+01
2023-12-20 18:10:06,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=7973.333333333333, ans=0.02
2023-12-20 18:10:12,545 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.742e+01 3.651e+01 4.128e+01 4.777e+01 1.617e+02, threshold=8.255e+01, percent-clipped=1.0
2023-12-20 18:10:15,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=8040.0, ans=0.6186
2023-12-20 18:10:17,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=8040.0, ans=0.125
2023-12-20 18:10:17,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=8040.0, ans=0.125
2023-12-20 18:10:36,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=8173.333333333333, ans=0.125
2023-12-20 18:10:49,626 INFO [train.py:886] (1/4) Epoch 24, batch 50, loss[loss=0.03358, audio_tagging_loss=0.03358, over 25000.00 frames. ], tot_loss[loss=0.03468, audio_tagging_loss=0.03468, over 1116019.87 frames. ], batch size: 100, lr: 1.73e-02, grad_scale: 32.0
2023-12-20 18:11:13,606 INFO [train.py:886] (1/4) Epoch 25, batch 0, loss[loss=0.03754, audio_tagging_loss=0.03754, over 24081.00 frames. ], tot_loss[loss=0.03754, audio_tagging_loss=0.03754, over 24081.00 frames. ], batch size: 100, lr: 1.70e-02, grad_scale: 32.0
2023-12-20 18:11:13,607 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:11:34,708 INFO [train.py:917] (1/4) Epoch 25, validation: loss=0.04257, audio_tagging_loss=0.04257, over 3737520.00 frames.
2023-12-20 18:11:34,708 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:11:57,380 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.23 vs. limit=10.67
2023-12-20 18:12:02,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=8520.0, ans=0.125
2023-12-20 18:12:10,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8520.0, ans=0.2148
2023-12-20 18:12:10,788 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.10 vs. limit=7.4079999999999995
2023-12-20 18:12:11,720 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.32 vs. limit=9.26
2023-12-20 18:12:14,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8586.666666666666, ans=0.21413333333333334
2023-12-20 18:12:22,263 INFO [train.py:886] (1/4) Epoch 25, batch 50, loss[loss=0.0314, audio_tagging_loss=0.0314, over 25000.00 frames. ], tot_loss[loss=0.03346, audio_tagging_loss=0.03346, over 1117704.48 frames. ], batch size: 100, lr: 1.70e-02, grad_scale: 32.0
2023-12-20 18:12:45,038 INFO [train.py:886] (1/4) Epoch 26, batch 0, loss[loss=0.04567, audio_tagging_loss=0.04567, over 21448.00 frames. ], tot_loss[loss=0.04567, audio_tagging_loss=0.04567, over 21448.00 frames. ], batch size: 106, lr: 1.66e-02, grad_scale: 32.0
2023-12-20 18:12:45,039 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:13:05,885 INFO [train.py:917] (1/4) Epoch 26, validation: loss=0.04241, audio_tagging_loss=0.04241, over 3737520.00 frames.
2023-12-20 18:13:05,886 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:13:12,409 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.047e+01 3.673e+01 4.044e+01 4.675e+01 8.607e+01, threshold=8.088e+01, percent-clipped=1.0
2023-12-20 18:13:24,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=8800.0, ans=0.212
2023-12-20 18:13:25,251 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.20 vs. limit=7.52
2023-12-20 18:13:46,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=8933.333333333334, ans=0.5873333333333334
2023-12-20 18:13:53,005 INFO [train.py:886] (1/4) Epoch 26, batch 50, loss[loss=0.03165, audio_tagging_loss=0.03165, over 25000.00 frames. ], tot_loss[loss=0.03303, audio_tagging_loss=0.03303, over 1118245.57 frames. ], batch size: 100, lr: 1.66e-02, grad_scale: 32.0
2023-12-20 18:14:18,298 INFO [train.py:886] (1/4) Epoch 27, batch 0, loss[loss=0.04504, audio_tagging_loss=0.04504, over 20685.00 frames. ], tot_loss[loss=0.04504, audio_tagging_loss=0.04504, over 20685.00 frames. ], batch size: 106, lr: 1.63e-02, grad_scale: 32.0
2023-12-20 18:14:18,299 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:14:31,255 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.1727, 4.1085, 4.0626, 3.8466], device='cuda:1')
2023-12-20 18:14:37,832 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.3524, 2.1953, 2.3511, 2.2864], device='cuda:1')
2023-12-20 18:14:39,329 INFO [train.py:917] (1/4) Epoch 27, validation: loss=0.04294, audio_tagging_loss=0.04294, over 3737520.00 frames.
2023-12-20 18:14:39,330 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:14:40,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=9013.333333333334, ans=0.125
2023-12-20 18:14:54,626 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=25.74 vs. limit=10.905
2023-12-20 18:15:11,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=9213.333333333334, ans=0.125
2023-12-20 18:15:11,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=9213.333333333334, ans=0.0
2023-12-20 18:15:14,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=9213.333333333334, ans=0.125
2023-12-20 18:15:19,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=9280.0, ans=0.2072
2023-12-20 18:15:19,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.13 vs. limit=14.46
2023-12-20 18:15:26,760 INFO [train.py:886] (1/4) Epoch 27, batch 50, loss[loss=0.03001, audio_tagging_loss=0.03001, over 25000.00 frames. ], tot_loss[loss=0.03177, audio_tagging_loss=0.03177, over 1123725.78 frames. ], batch size: 100, lr: 1.63e-02, grad_scale: 32.0
2023-12-20 18:15:48,259 INFO [train.py:886] (1/4) Epoch 28, batch 0, loss[loss=0.03468, audio_tagging_loss=0.03468, over 24092.00 frames. ], tot_loss[loss=0.03468, audio_tagging_loss=0.03468, over 24092.00 frames. ], batch size: 100, lr: 1.60e-02, grad_scale: 32.0
2023-12-20 18:15:48,259 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:16:09,708 INFO [train.py:917] (1/4) Epoch 28, validation: loss=0.04282, audio_tagging_loss=0.04282, over 3737520.00 frames.
2023-12-20 18:16:09,709 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:16:12,513 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.131e+01 3.970e+01 4.630e+01 5.343e+01 9.281e+01, threshold=9.260e+01, percent-clipped=1.0
2023-12-20 18:16:25,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=9426.666666666666, ans=0.027388888888888893
2023-12-20 18:16:28,762 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.23 vs. limit=11.06
2023-12-20 18:16:33,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=9493.333333333334, ans=0.008805797101449275
2023-12-20 18:16:39,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=9560.0, ans=0.5654
2023-12-20 18:16:56,949 INFO [train.py:886] (1/4) Epoch 28, batch 50, loss[loss=0.02919, audio_tagging_loss=0.02919, over 25000.00 frames. ], tot_loss[loss=0.03098, audio_tagging_loss=0.03098, over 1121594.30 frames. ], batch size: 100, lr: 1.60e-02, grad_scale: 32.0
2023-12-20 18:17:19,797 INFO [train.py:886] (1/4) Epoch 29, batch 0, loss[loss=0.03225, audio_tagging_loss=0.03225, over 25000.00 frames. ], tot_loss[loss=0.03225, audio_tagging_loss=0.03225, over 25000.00 frames. ], batch size: 100, lr: 1.57e-02, grad_scale: 32.0
2023-12-20 18:17:19,797 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:17:30,806 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.7690, 2.3818, 2.4859, 2.7226], device='cuda:1')
2023-12-20 18:17:32,138 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([1.8887, 1.7890, 1.7039, 1.8671, 1.7203, 1.8254, 1.5409, 1.6900],
device='cuda:1')
2023-12-20 18:17:40,754 INFO [train.py:917] (1/4) Epoch 29, validation: loss=0.04276, audio_tagging_loss=0.04276, over 3737520.00 frames.
2023-12-20 18:17:40,755 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:17:40,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=9706.666666666666, ans=0.125
2023-12-20 18:18:10,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=9906.666666666666, ans=0.125
2023-12-20 18:18:16,781 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.07 vs. limit=11.215
2023-12-20 18:18:29,133 INFO [train.py:886] (1/4) Epoch 29, batch 50, loss[loss=0.02827, audio_tagging_loss=0.02827, over 25000.00 frames. ], tot_loss[loss=0.02979, audio_tagging_loss=0.02979, over 1127124.95 frames. ], batch size: 100, lr: 1.57e-02, grad_scale: 32.0
2023-12-20 18:18:30,003 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.044e+01 4.177e+01 4.600e+01 5.564e+01 7.757e+01, threshold=9.200e+01, percent-clipped=0.0
2023-12-20 18:18:51,717 INFO [train.py:886] (1/4) Epoch 30, batch 0, loss[loss=0.03938, audio_tagging_loss=0.03938, over 20030.00 frames. ], tot_loss[loss=0.03938, audio_tagging_loss=0.03938, over 20030.00 frames. ], batch size: 106, lr: 1.54e-02, grad_scale: 32.0
2023-12-20 18:18:51,718 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:19:02,612 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.1611, 1.9390, 1.9798, 1.3439, 1.8295, 1.8108, 1.7752, 1.8664],
device='cuda:1')
2023-12-20 18:19:04,668 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.1123, 1.9960, 1.9897, 1.6441, 1.9536, 1.8918, 1.8109, 1.8462],
device='cuda:1')
2023-12-20 18:19:12,597 INFO [train.py:917] (1/4) Epoch 30, validation: loss=0.04346, audio_tagging_loss=0.04346, over 3737520.00 frames.
2023-12-20 18:19:12,597 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:19:29,537 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.24 vs. limit=10.059999999999999
2023-12-20 18:19:30,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=10120.0, ans=0.125
2023-12-20 18:19:45,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=10253.333333333334, ans=0.5411333333333334
2023-12-20 18:19:59,939 INFO [train.py:886] (1/4) Epoch 30, batch 50, loss[loss=0.02916, audio_tagging_loss=0.02916, over 25000.00 frames. ], tot_loss[loss=0.02901, audio_tagging_loss=0.02901, over 1119414.10 frames. ], batch size: 100, lr: 1.54e-02, grad_scale: 32.0
2023-12-20 18:20:00,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=10386.666666666666, ans=0.125
2023-12-20 18:20:22,372 INFO [train.py:886] (1/4) Epoch 31, batch 0, loss[loss=0.03037, audio_tagging_loss=0.03037, over 24148.00 frames. ], tot_loss[loss=0.03037, audio_tagging_loss=0.03037, over 24148.00 frames. ], batch size: 100, lr: 1.52e-02, grad_scale: 32.0
2023-12-20 18:20:22,372 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:20:32,884 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.0702, 1.7091, 1.7310, 1.8776, 1.7862, 1.8434, 1.5666, 1.7082],
device='cuda:1')
2023-12-20 18:20:33,514 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.3741, 2.3594, 2.5183, 2.3064], device='cuda:1')
2023-12-20 18:20:43,507 INFO [train.py:917] (1/4) Epoch 31, validation: loss=0.04363, audio_tagging_loss=0.04363, over 3737520.00 frames.
2023-12-20 18:20:43,508 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:20:46,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=10400.0, ans=0.023333333333333334
2023-12-20 18:20:51,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=10400.0, ans=0.125
2023-12-20 18:21:07,696 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.60 vs. limit=15.4
2023-12-20 18:21:14,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=10600.0, ans=0.022500000000000003
2023-12-20 18:21:29,125 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.385e+01 4.278e+01 4.904e+01 5.799e+01 1.168e+02, threshold=9.808e+01, percent-clipped=2.0
2023-12-20 18:21:30,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=10666.666666666666, ans=0.0
2023-12-20 18:21:31,379 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.07 vs. limit=10.366666666666667
2023-12-20 18:21:31,820 INFO [train.py:886] (1/4) Epoch 31, batch 50, loss[loss=0.02962, audio_tagging_loss=0.02962, over 25000.00 frames. ], tot_loss[loss=0.0283, audio_tagging_loss=0.0283, over 1126555.81 frames. ], batch size: 100, lr: 1.51e-02, grad_scale: 32.0
2023-12-20 18:21:54,502 INFO [train.py:886] (1/4) Epoch 32, batch 0, loss[loss=0.03147, audio_tagging_loss=0.03147, over 24121.00 frames. ], tot_loss[loss=0.03147, audio_tagging_loss=0.03147, over 24121.00 frames. ], batch size: 100, lr: 1.49e-02, grad_scale: 32.0
2023-12-20 18:21:54,503 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:22:14,149 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.2022, 1.9398, 1.8179, 1.8579], device='cuda:1')
2023-12-20 18:22:15,980 INFO [train.py:917] (1/4) Epoch 32, validation: loss=0.04494, audio_tagging_loss=0.04494, over 3737520.00 frames.
2023-12-20 18:22:15,980 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:22:21,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=10746.666666666666, ans=0.19253333333333333
2023-12-20 18:22:25,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=10813.333333333334, ans=0.02161111111111111
2023-12-20 18:22:27,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=10813.333333333334, ans=0.125
2023-12-20 18:22:35,786 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=18.20 vs. limit=11.58
2023-12-20 18:22:38,985 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.51 vs. limit=15.66
2023-12-20 18:22:40,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=10880.0, ans=0.19119999999999998
2023-12-20 18:22:40,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=10880.0, ans=0.021333333333333336
2023-12-20 18:22:44,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=10946.666666666666, ans=0.125
2023-12-20 18:22:47,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=10946.666666666666, ans=0.36419999999999997
2023-12-20 18:22:51,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=10946.666666666666, ans=0.19053333333333333
2023-12-20 18:22:58,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=11013.333333333334, ans=0.125
2023-12-20 18:22:58,639 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.53 vs. limit=11.629999999999999
2023-12-20 18:22:59,696 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.16 vs. limit=15.76
2023-12-20 18:23:02,777 INFO [train.py:886] (1/4) Epoch 32, batch 50, loss[loss=0.02606, audio_tagging_loss=0.02606, over 25000.00 frames. ], tot_loss[loss=0.02763, audio_tagging_loss=0.02763, over 1114963.93 frames. ], batch size: 100, lr: 1.49e-02, grad_scale: 32.0
2023-12-20 18:23:22,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=11093.333333333334, ans=0.18906666666666666
2023-12-20 18:23:25,184 INFO [train.py:886] (1/4) Epoch 33, batch 0, loss[loss=0.03231, audio_tagging_loss=0.03231, over 21830.00 frames. ], tot_loss[loss=0.03231, audio_tagging_loss=0.03231, over 21830.00 frames. ], batch size: 106, lr: 1.47e-02, grad_scale: 32.0
2023-12-20 18:23:25,185 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:23:46,126 INFO [train.py:917] (1/4) Epoch 33, validation: loss=0.0459, audio_tagging_loss=0.0459, over 3737520.00 frames.
2023-12-20 18:23:46,126 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:23:48,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=11093.333333333334, ans=0.125
2023-12-20 18:23:59,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=11.684999999999999
2023-12-20 18:24:09,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=11226.666666666666, ans=0.18773333333333334
2023-12-20 18:24:10,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.31 vs. limit=11.71
2023-12-20 18:24:20,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=11293.333333333334, ans=0.019611111111111107
2023-12-20 18:24:25,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=11360.0, ans=0.125
2023-12-20 18:24:26,656 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.339e+01 4.449e+01 5.027e+01 5.967e+01 1.050e+02, threshold=1.005e+02, percent-clipped=1.0
2023-12-20 18:24:29,095 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=4.704
2023-12-20 18:24:33,020 INFO [train.py:886] (1/4) Epoch 33, batch 50, loss[loss=0.02471, audio_tagging_loss=0.02471, over 25000.00 frames. ], tot_loss[loss=0.02602, audio_tagging_loss=0.02602, over 1125427.78 frames. ], batch size: 100, lr: 1.47e-02, grad_scale: 32.0
2023-12-20 18:24:54,840 INFO [train.py:886] (1/4) Epoch 34, batch 0, loss[loss=0.02736, audio_tagging_loss=0.02736, over 21449.00 frames. ], tot_loss[loss=0.02736, audio_tagging_loss=0.02736, over 21449.00 frames. ], batch size: 106, lr: 1.44e-02, grad_scale: 32.0
2023-12-20 18:24:54,841 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:25:16,064 INFO [train.py:917] (1/4) Epoch 34, validation: loss=0.0463, audio_tagging_loss=0.0463, over 3737520.00 frames.
2023-12-20 18:25:16,065 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:25:18,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=11440.0, ans=0.019000000000000003
2023-12-20 18:25:32,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=11506.666666666666, ans=0.018722222222222223
2023-12-20 18:25:53,225 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=4.756
2023-12-20 18:26:02,680 INFO [train.py:886] (1/4) Epoch 34, batch 50, loss[loss=0.02452, audio_tagging_loss=0.02452, over 25000.00 frames. ], tot_loss[loss=0.02557, audio_tagging_loss=0.02557, over 1116622.69 frames. ], batch size: 100, lr: 1.44e-02, grad_scale: 32.0
2023-12-20 18:26:24,394 INFO [train.py:886] (1/4) Epoch 35, batch 0, loss[loss=0.02645, audio_tagging_loss=0.02645, over 24158.00 frames. ], tot_loss[loss=0.02645, audio_tagging_loss=0.02645, over 24158.00 frames. ], batch size: 100, lr: 1.42e-02, grad_scale: 32.0
2023-12-20 18:26:24,395 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:26:43,983 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.9411, 2.5709, 2.4494, 2.7765], device='cuda:1')
2023-12-20 18:26:45,180 INFO [train.py:917] (1/4) Epoch 35, validation: loss=0.04736, audio_tagging_loss=0.04736, over 3737520.00 frames.
2023-12-20 18:26:45,181 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:26:49,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.60 vs. limit=7.946666666666666
2023-12-20 18:27:08,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=11920.0, ans=0.4828
2023-12-20 18:27:13,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=11986.666666666666, ans=0.125
2023-12-20 18:27:17,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=11986.666666666666, ans=0.18013333333333334
2023-12-20 18:27:23,466 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.764e+01 4.533e+01 5.198e+01 5.955e+01 1.043e+02, threshold=1.040e+02, percent-clipped=1.0
2023-12-20 18:27:33,753 INFO [train.py:886] (1/4) Epoch 35, batch 50, loss[loss=0.02185, audio_tagging_loss=0.02185, over 25000.00 frames. ], tot_loss[loss=0.02465, audio_tagging_loss=0.02465, over 1122465.19 frames. ], batch size: 100, lr: 1.42e-02, grad_scale: 32.0
2023-12-20 18:27:55,033 INFO [train.py:886] (1/4) Epoch 36, batch 0, loss[loss=0.02595, audio_tagging_loss=0.02595, over 24087.00 frames. ], tot_loss[loss=0.02595, audio_tagging_loss=0.02595, over 24087.00 frames. ], batch size: 100, lr: 1.40e-02, grad_scale: 32.0
2023-12-20 18:27:55,033 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:28:12,154 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.2063, 2.9432, 2.8897, 2.9267], device='cuda:1')
2023-12-20 18:28:16,075 INFO [train.py:917] (1/4) Epoch 36, validation: loss=0.04841, audio_tagging_loss=0.04841, over 3737520.00 frames.
2023-12-20 18:28:16,076 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:28:16,515 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.25 vs. limit=8.033333333333333
2023-12-20 18:28:19,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=12133.333333333334, ans=0.01611111111111111
2023-12-20 18:28:23,023 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.17 vs. limit=8.033333333333333
2023-12-20 18:28:25,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=12200.0, ans=0.008217391304347826
2023-12-20 18:28:30,501 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.43 vs. limit=12.075
2023-12-20 18:28:40,025 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.54 vs. limit=12.1
2023-12-20 18:28:43,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=12333.333333333334, ans=0.125
2023-12-20 18:28:51,436 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.72 vs. limit=12.125
2023-12-20 18:29:03,191 INFO [train.py:886] (1/4) Epoch 36, batch 50, loss[loss=0.02623, audio_tagging_loss=0.02623, over 25000.00 frames. ], tot_loss[loss=0.02421, audio_tagging_loss=0.02421, over 1120299.31 frames. ], batch size: 100, lr: 1.40e-02, grad_scale: 32.0
2023-12-20 18:29:24,444 INFO [train.py:886] (1/4) Epoch 37, batch 0, loss[loss=0.03095, audio_tagging_loss=0.03095, over 21211.00 frames. ], tot_loss[loss=0.03095, audio_tagging_loss=0.03095, over 21211.00 frames. ], batch size: 106, lr: 1.38e-02, grad_scale: 32.0
2023-12-20 18:29:24,445 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:29:34,436 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.0752, 4.7395, 4.6235, 4.3424], device='cuda:1')
2023-12-20 18:29:45,682 INFO [train.py:917] (1/4) Epoch 37, validation: loss=0.04928, audio_tagging_loss=0.04928, over 3737520.00 frames.
2023-12-20 18:29:45,683 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:29:51,369 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=4.872
2023-12-20 18:29:59,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=12546.666666666666, ans=0.17453333333333335
2023-12-20 18:30:01,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=12546.666666666666, ans=0.125
2023-12-20 18:30:02,466 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.54 vs. limit=12.205
2023-12-20 18:30:09,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=12613.333333333334, ans=0.45853333333333335
2023-12-20 18:30:11,388 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=12.23
2023-12-20 18:30:17,564 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.12 vs. limit=17.009999999999998
2023-12-20 18:30:19,000 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.554e+01 4.732e+01 5.545e+01 6.466e+01 1.044e+02, threshold=1.109e+02, percent-clipped=1.0
2023-12-20 18:30:22,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=12680.0, ans=0.013833333333333336
2023-12-20 18:30:27,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=12746.666666666666, ans=0.125
2023-12-20 18:30:30,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=12746.666666666666, ans=0.02
2023-12-20 18:30:32,821 INFO [train.py:886] (1/4) Epoch 37, batch 50, loss[loss=0.02155, audio_tagging_loss=0.02155, over 25000.00 frames. ], tot_loss[loss=0.02298, audio_tagging_loss=0.02298, over 1120275.84 frames. ], batch size: 100, lr: 1.38e-02, grad_scale: 32.0
2023-12-20 18:30:55,799 INFO [train.py:886] (1/4) Epoch 38, batch 0, loss[loss=0.01982, audio_tagging_loss=0.01982, over 25000.00 frames. ], tot_loss[loss=0.01982, audio_tagging_loss=0.01982, over 25000.00 frames. ], batch size: 100, lr: 1.36e-02, grad_scale: 32.0
2023-12-20 18:30:55,799 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:31:16,995 INFO [train.py:917] (1/4) Epoch 38, validation: loss=0.04916, audio_tagging_loss=0.04916, over 3737520.00 frames.
2023-12-20 18:31:16,996 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:31:20,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=12826.666666666666, ans=0.125
2023-12-20 18:31:24,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=12826.666666666666, ans=0.013222222222222225
2023-12-20 18:31:26,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=12893.333333333334, ans=0.035
2023-12-20 18:31:39,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=12960.0, ans=0.1704
2023-12-20 18:31:40,420 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.25 vs. limit=17.22
2023-12-20 18:32:04,831 INFO [train.py:886] (1/4) Epoch 38, batch 50, loss[loss=0.02165, audio_tagging_loss=0.02165, over 25000.00 frames. ], tot_loss[loss=0.02196, audio_tagging_loss=0.02196, over 1123455.74 frames. ], batch size: 100, lr: 1.36e-02, grad_scale: 32.0
2023-12-20 18:32:26,418 INFO [train.py:886] (1/4) Epoch 39, batch 0, loss[loss=0.0235, audio_tagging_loss=0.0235, over 24081.00 frames. ], tot_loss[loss=0.0235, audio_tagging_loss=0.0235, over 24081.00 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 32.0
2023-12-20 18:32:26,419 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:32:46,859 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.7960, 2.1194, 2.1444, 2.2305, 2.1937, 2.3078, 2.0764, 2.0770],
device='cuda:1')
2023-12-20 18:32:47,552 INFO [train.py:917] (1/4) Epoch 39, validation: loss=0.05058, audio_tagging_loss=0.05058, over 3737520.00 frames.
2023-12-20 18:32:47,552 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:32:49,137 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.94 vs. limit=11.586666666666666
2023-12-20 18:32:53,328 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.52 vs. limit=12.440000000000001
2023-12-20 18:33:05,195 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.626e-01
2023-12-20 18:33:05,380 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.60 vs. limit=8.31
2023-12-20 18:33:14,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=13306.666666666666, ans=0.43426666666666675
2023-12-20 18:33:17,346 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.971e+01 5.139e+01 5.911e+01 6.986e+01 1.449e+02, threshold=1.182e+02, percent-clipped=3.0
2023-12-20 18:33:26,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=13440.0, ans=0.1656
2023-12-20 18:33:33,695 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.493e-01
2023-12-20 18:33:35,341 INFO [train.py:886] (1/4) Epoch 39, batch 50, loss[loss=0.02042, audio_tagging_loss=0.02042, over 25000.00 frames. ], tot_loss[loss=0.02149, audio_tagging_loss=0.02149, over 1121525.25 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 32.0
2023-12-20 18:33:57,926 INFO [train.py:886] (1/4) Epoch 40, batch 0, loss[loss=0.02105, audio_tagging_loss=0.02105, over 24062.00 frames. ], tot_loss[loss=0.02105, audio_tagging_loss=0.02105, over 24062.00 frames. ], batch size: 100, lr: 1.32e-02, grad_scale: 32.0
2023-12-20 18:33:57,927 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:34:19,045 INFO [train.py:917] (1/4) Epoch 40, validation: loss=0.05208, audio_tagging_loss=0.05208, over 3737520.00 frames.
2023-12-20 18:34:19,046 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:34:40,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=13653.333333333334, ans=0.125
2023-12-20 18:34:44,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=13653.333333333334, ans=0.8865333333333333
2023-12-20 18:34:55,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=13720.0, ans=0.009500000000000001
2023-12-20 18:35:04,450 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.64 vs. limit=8.446666666666665
2023-12-20 18:35:06,547 INFO [train.py:886] (1/4) Epoch 40, batch 50, loss[loss=0.02269, audio_tagging_loss=0.02269, over 25000.00 frames. ], tot_loss[loss=0.02017, audio_tagging_loss=0.02017, over 1121341.24 frames. ], batch size: 100, lr: 1.32e-02, grad_scale: 32.0
2023-12-20 18:35:29,525 INFO [train.py:886] (1/4) Epoch 41, batch 0, loss[loss=0.01894, audio_tagging_loss=0.01894, over 25000.00 frames. ], tot_loss[loss=0.01894, audio_tagging_loss=0.01894, over 25000.00 frames. ], batch size: 100, lr: 1.30e-02, grad_scale: 32.0
2023-12-20 18:35:29,526 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:35:47,521 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.3157, 2.9804, 3.3457, 3.0635], device='cuda:1')
2023-12-20 18:35:50,409 INFO [train.py:917] (1/4) Epoch 41, validation: loss=0.05259, audio_tagging_loss=0.05259, over 3737520.00 frames.
2023-12-20 18:35:50,409 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:36:14,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=14000.0, ans=0.125
2023-12-20 18:36:16,658 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.775e+01 5.160e+01 5.694e+01 6.780e+01 1.124e+02, threshold=1.139e+02, percent-clipped=0.0
2023-12-20 18:36:17,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=14000.0, ans=0.10999999999999999
2023-12-20 18:36:31,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=14133.333333333334, ans=0.125
2023-12-20 18:36:37,870 INFO [train.py:886] (1/4) Epoch 41, batch 50, loss[loss=0.0187, audio_tagging_loss=0.0187, over 25000.00 frames. ], tot_loss[loss=0.01926, audio_tagging_loss=0.01926, over 1116780.63 frames. ], batch size: 100, lr: 1.30e-02, grad_scale: 32.0
2023-12-20 18:37:00,622 INFO [train.py:886] (1/4) Epoch 42, batch 0, loss[loss=0.01991, audio_tagging_loss=0.01991, over 24159.00 frames. ], tot_loss[loss=0.01991, audio_tagging_loss=0.01991, over 24159.00 frames. ], batch size: 100, lr: 1.29e-02, grad_scale: 32.0
2023-12-20 18:37:00,622 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:37:21,715 INFO [train.py:917] (1/4) Epoch 42, validation: loss=0.0541, audio_tagging_loss=0.0541, over 3737520.00 frames.
2023-12-20 18:37:21,716 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:37:27,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=14213.333333333334, ans=0.125
2023-12-20 18:37:34,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=14280.0, ans=0.125
2023-12-20 18:37:35,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=14280.0, ans=0.125
2023-12-20 18:37:35,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=14280.0, ans=0.8928
2023-12-20 18:37:41,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=14346.666666666666, ans=0.4152
2023-12-20 18:37:48,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=14346.666666666666, ans=0.006888888888888889
2023-12-20 18:38:09,808 INFO [train.py:886] (1/4) Epoch 42, batch 50, loss[loss=0.01856, audio_tagging_loss=0.01856, over 25000.00 frames. ], tot_loss[loss=0.01859, audio_tagging_loss=0.01859, over 1119067.54 frames. ], batch size: 100, lr: 1.29e-02, grad_scale: 32.0
2023-12-20 18:38:32,307 INFO [train.py:886] (1/4) Epoch 43, batch 0, loss[loss=0.02681, audio_tagging_loss=0.02681, over 20614.00 frames. ], tot_loss[loss=0.02681, audio_tagging_loss=0.02681, over 20614.00 frames. ], batch size: 106, lr: 1.27e-02, grad_scale: 32.0
2023-12-20 18:38:32,308 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:38:40,438 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.1897, 2.6265, 2.5322, 2.9896], device='cuda:1')
2023-12-20 18:38:53,029 INFO [train.py:917] (1/4) Epoch 43, validation: loss=0.05602, audio_tagging_loss=0.05602, over 3737520.00 frames.
2023-12-20 18:38:53,030 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:38:53,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=14560.0, ans=0.006000000000000005
2023-12-20 18:39:01,216 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.95 vs. limit=12.96
2023-12-20 18:39:02,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=14560.0, ans=0.125
2023-12-20 18:39:02,236 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.98 vs. limit=8.64
2023-12-20 18:39:16,031 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 4.316e+01 5.471e+01 6.063e+01 6.688e+01 1.130e+02, threshold=1.213e+02, percent-clipped=0.0
2023-12-20 18:39:17,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.29 vs. limit=9.877333333333333
2023-12-20 18:39:29,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=14760.0, ans=0.3834000000000001
2023-12-20 18:39:38,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=14826.666666666666, ans=0.125
2023-12-20 18:39:39,400 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.11 vs. limit=13.059999999999999
2023-12-20 18:39:41,477 INFO [train.py:886] (1/4) Epoch 43, batch 50, loss[loss=0.0175, audio_tagging_loss=0.0175, over 25000.00 frames. ], tot_loss[loss=0.01777, audio_tagging_loss=0.01777, over 1117128.51 frames. ], batch size: 100, lr: 1.27e-02, grad_scale: 32.0
2023-12-20 18:40:04,355 INFO [train.py:886] (1/4) Epoch 44, batch 0, loss[loss=0.01599, audio_tagging_loss=0.01599, over 24088.00 frames. ], tot_loss[loss=0.01599, audio_tagging_loss=0.01599, over 24088.00 frames. ], batch size: 100, lr: 1.25e-02, grad_scale: 32.0
2023-12-20 18:40:04,355 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:40:25,327 INFO [train.py:917] (1/4) Epoch 44, validation: loss=0.05682, audio_tagging_loss=0.05682, over 3737520.00 frames.
2023-12-20 18:40:25,328 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:40:31,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=14906.666666666666, ans=0.125
2023-12-20 18:40:32,895 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.58 vs. limit=13.09
2023-12-20 18:40:40,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=14973.333333333334, ans=0.125
2023-12-20 18:40:40,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=14973.333333333334, ans=0.15026666666666666
2023-12-20 18:40:52,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=15040.0, ans=0.125
2023-12-20 18:40:57,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=15106.666666666666, ans=0.125
2023-12-20 18:41:10,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=15173.333333333334, ans=0.09826666666666664
2023-12-20 18:41:10,486 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.39 vs. limit=13.190000000000001
2023-12-20 18:41:12,865 INFO [train.py:886] (1/4) Epoch 44, batch 50, loss[loss=0.01775, audio_tagging_loss=0.01775, over 25000.00 frames. ], tot_loss[loss=0.01683, audio_tagging_loss=0.01683, over 1115327.13 frames. ], batch size: 100, lr: 1.25e-02, grad_scale: 32.0
2023-12-20 18:41:35,895 INFO [train.py:886] (1/4) Epoch 45, batch 0, loss[loss=0.01589, audio_tagging_loss=0.01589, over 25000.00 frames. ], tot_loss[loss=0.01589, audio_tagging_loss=0.01589, over 25000.00 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 32.0
2023-12-20 18:41:35,896 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:41:56,901 INFO [train.py:917] (1/4) Epoch 45, validation: loss=0.05811, audio_tagging_loss=0.05811, over 3737520.00 frames.
2023-12-20 18:41:56,902 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:42:05,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=15253.333333333334, ans=0.003111111111111106
2023-12-20 18:42:15,214 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.876e+01 5.082e+01 5.625e+01 6.615e+01 1.122e+02, threshold=1.125e+02, percent-clipped=0.0
2023-12-20 18:42:23,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=15386.666666666666, ans=0.002555555555555554
2023-12-20 18:42:26,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=15453.333333333334, ans=0.125
2023-12-20 18:42:29,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=15453.333333333334, ans=0.14546666666666666
2023-12-20 18:42:34,724 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.93 vs. limit=13.32
2023-12-20 18:42:44,420 INFO [train.py:886] (1/4) Epoch 45, batch 50, loss[loss=0.01693, audio_tagging_loss=0.01693, over 25000.00 frames. ], tot_loss[loss=0.01692, audio_tagging_loss=0.01692, over 1112449.27 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 64.0
2023-12-20 18:43:02,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=15600.0, ans=0.0
2023-12-20 18:43:03,073 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=13.35
2023-12-20 18:43:06,810 INFO [train.py:886] (1/4) Epoch 46, batch 0, loss[loss=0.01734, audio_tagging_loss=0.01734, over 24100.00 frames. ], tot_loss[loss=0.01734, audio_tagging_loss=0.01734, over 24100.00 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 64.0
2023-12-20 18:43:06,811 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:43:18,196 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.5318, 2.7592, 2.9153, 2.9462], device='cuda:1')
2023-12-20 18:43:27,878 INFO [train.py:917] (1/4) Epoch 46, validation: loss=0.05956, audio_tagging_loss=0.05956, over 3737520.00 frames.
2023-12-20 18:43:27,879 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:43:34,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=15600.0, ans=0.007478260869565217
2023-12-20 18:43:37,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=15666.666666666666, ans=10.0
2023-12-20 18:43:39,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=15666.666666666666, ans=0.9066666666666666
2023-12-20 18:43:57,799 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.76 vs. limit=13.425
2023-12-20 18:44:07,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=15866.666666666666, ans=0.125
2023-12-20 18:44:08,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.30 vs. limit=10.346666666666668
2023-12-20 18:44:15,171 INFO [train.py:886] (1/4) Epoch 46, batch 50, loss[loss=0.01433, audio_tagging_loss=0.01433, over 25000.00 frames. ], tot_loss[loss=0.01549, audio_tagging_loss=0.01549, over 1126431.18 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 64.0
2023-12-20 18:44:38,146 INFO [train.py:886] (1/4) Epoch 47, batch 0, loss[loss=0.01727, audio_tagging_loss=0.01727, over 24049.00 frames. ], tot_loss[loss=0.01727, audio_tagging_loss=0.01727, over 24049.00 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0
2023-12-20 18:44:38,147 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:44:48,611 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.3128, 2.9586, 3.3100, 2.8855], device='cuda:1')
2023-12-20 18:44:59,322 INFO [train.py:917] (1/4) Epoch 47, validation: loss=0.06125, audio_tagging_loss=0.06125, over 3737520.00 frames.
2023-12-20 18:44:59,323 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:45:01,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=15946.666666666666, ans=0.14053333333333334
2023-12-20 18:45:01,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=15946.666666666666, ans=0.125
2023-12-20 18:45:06,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.77 vs. limit=13.48
2023-12-20 18:45:14,000 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 4.428e+01 5.199e+01 5.973e+01 6.776e+01 1.435e+02, threshold=1.195e+02, percent-clipped=1.0
2023-12-20 18:45:16,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=16013.333333333334, ans=0.33953333333333335
2023-12-20 18:45:32,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=16146.666666666666, ans=0.007359420289855072
2023-12-20 18:45:32,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=16146.666666666666, ans=0.125
2023-12-20 18:45:38,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=16213.333333333334, ans=0.13786666666666667
2023-12-20 18:45:40,287 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.53 vs. limit=13.58
2023-12-20 18:45:46,334 INFO [train.py:886] (1/4) Epoch 47, batch 50, loss[loss=0.01334, audio_tagging_loss=0.01334, over 25000.00 frames. ], tot_loss[loss=0.01472, audio_tagging_loss=0.01472, over 1119525.87 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0
2023-12-20 18:46:04,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=16293.333333333334, ans=0.125
2023-12-20 18:46:08,721 INFO [train.py:886] (1/4) Epoch 48, batch 0, loss[loss=0.03003, audio_tagging_loss=0.03003, over 21467.00 frames. ], tot_loss[loss=0.03003, audio_tagging_loss=0.03003, over 21467.00 frames. ], batch size: 106, lr: 1.20e-02, grad_scale: 64.0
2023-12-20 18:46:08,722 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:46:29,405 INFO [train.py:917] (1/4) Epoch 48, validation: loss=0.06238, audio_tagging_loss=0.06238, over 3737520.00 frames.
2023-12-20 18:46:29,406 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:46:45,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=16360.0, ans=0.125
2023-12-20 18:46:55,146 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.48 vs. limit=5.464
2023-12-20 18:47:12,670 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=13.71
2023-12-20 18:47:13,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.64 vs. limit=13.71
2023-12-20 18:47:16,769 INFO [train.py:886] (1/4) Epoch 48, batch 50, loss[loss=0.01539, audio_tagging_loss=0.01539, over 25000.00 frames. ], tot_loss[loss=0.01512, audio_tagging_loss=0.01512, over 1119249.00 frames. ], batch size: 100, lr: 1.19e-02, grad_scale: 64.0
2023-12-20 18:47:37,824 INFO [train.py:886] (1/4) Epoch 49, batch 0, loss[loss=0.0168, audio_tagging_loss=0.0168, over 24182.00 frames. ], tot_loss[loss=0.0168, audio_tagging_loss=0.0168, over 24182.00 frames. ], batch size: 100, lr: 1.18e-02, grad_scale: 64.0
2023-12-20 18:47:37,825 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:47:58,816 INFO [train.py:917] (1/4) Epoch 49, validation: loss=0.06394, audio_tagging_loss=0.06394, over 3737520.00 frames.
2023-12-20 18:47:58,817 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:48:07,454 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.07 vs. limit=5.496
2023-12-20 18:48:07,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=16706.666666666668, ans=0.125
2023-12-20 18:48:09,447 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 4.348e+01 5.324e+01 6.019e+01 6.956e+01 1.317e+02, threshold=1.204e+02, percent-clipped=1.0
2023-12-20 18:48:21,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=16773.333333333332, ans=0.125
2023-12-20 18:48:28,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=16840.0, ans=0.125
2023-12-20 18:48:42,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=16906.666666666668, ans=0.125
2023-12-20 18:48:45,803 INFO [train.py:886] (1/4) Epoch 49, batch 50, loss[loss=0.01189, audio_tagging_loss=0.01189, over 25000.00 frames. ], tot_loss[loss=0.01372, audio_tagging_loss=0.01372, over 1119978.83 frames. ], batch size: 100, lr: 1.18e-02, grad_scale: 64.0
2023-12-20 18:49:07,494 INFO [train.py:886] (1/4) Epoch 50, batch 0, loss[loss=0.0152, audio_tagging_loss=0.0152, over 24192.00 frames. ], tot_loss[loss=0.0152, audio_tagging_loss=0.0152, over 24192.00 frames. ], batch size: 100, lr: 1.17e-02, grad_scale: 64.0
2023-12-20 18:49:07,495 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:49:17,367 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.1453, 2.7847, 2.7589, 2.6183], device='cuda:1')
2023-12-20 18:49:28,225 INFO [train.py:917] (1/4) Epoch 50, validation: loss=0.06678, audio_tagging_loss=0.06678, over 3737520.00 frames.
2023-12-20 18:49:28,226 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:49:33,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=16986.666666666668, ans=0.0
2023-12-20 18:50:05,644 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.05 vs. limit=10.901333333333334
2023-12-20 18:50:06,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=17253.333333333332, ans=0.125
2023-12-20 18:50:08,547 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.99 vs. limit=13.969999999999999
2023-12-20 18:50:14,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=17320.0, ans=0.0
2023-12-20 18:50:15,459 INFO [train.py:886] (1/4) Epoch 50, batch 50, loss[loss=0.01495, audio_tagging_loss=0.01495, over 25000.00 frames. ], tot_loss[loss=0.0128, audio_tagging_loss=0.0128, over 1124283.53 frames. ], batch size: 100, lr: 1.17e-02, grad_scale: 32.0
2023-12-20 18:50:18,119 INFO [train.py:1099] (1/4) Done!
|