File size: 22,395 Bytes
1d6d730 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
[2024-09-02 17:03:02,219][accelerate.utils.other][WARNING] - Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
[2024-09-02 17:03:02,226][Main][INFO] - Distributed environment: DistributedType.NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Mixed precision type: bf16
[2024-09-02 17:03:02,227][Main][INFO] - Working directory is /workspace/nanoT5/logs/2024-09-02/17-03-02
[2024-09-02 17:14:53,691][Main][INFO] - [train] Step 100 out of 65536 | Loss --> 51.971 | Grad_l2 --> 82.676 | Weights_l2 --> 7042.062 | Lr --> 0.010 | Seconds_per_step --> 6.760 |
[2024-09-02 17:20:23,699][Main][INFO] - [train] Step 200 out of 65536 | Loss --> 14.150 | Grad_l2 --> 19.390 | Weights_l2 --> 7034.376 | Lr --> 0.010 | Seconds_per_step --> 3.300 |
[2024-09-02 17:25:54,840][Main][INFO] - [train] Step 300 out of 65536 | Loss --> 9.006 | Grad_l2 --> 9.061 | Weights_l2 --> 7026.824 | Lr --> 0.010 | Seconds_per_step --> 3.311 |
[2024-09-02 17:31:26,095][Main][INFO] - [train] Step 400 out of 65536 | Loss --> 7.529 | Grad_l2 --> 5.889 | Weights_l2 --> 7019.014 | Lr --> 0.010 | Seconds_per_step --> 3.313 |
[2024-09-02 17:36:56,190][Main][INFO] - [train] Step 500 out of 65536 | Loss --> 6.618 | Grad_l2 --> 4.039 | Weights_l2 --> 7010.897 | Lr --> 0.011 | Seconds_per_step --> 3.301 |
[2024-09-02 17:42:27,693][Main][INFO] - [train] Step 600 out of 65536 | Loss --> 5.994 | Grad_l2 --> 2.962 | Weights_l2 --> 7002.549 | Lr --> 0.011 | Seconds_per_step --> 3.315 |
[2024-09-02 17:47:57,967][Main][INFO] - [train] Step 700 out of 65536 | Loss --> 5.703 | Grad_l2 --> 2.434 | Weights_l2 --> 6994.267 | Lr --> 0.011 | Seconds_per_step --> 3.303 |
[2024-09-02 17:53:29,228][Main][INFO] - [train] Step 800 out of 65536 | Loss --> 6.603 | Grad_l2 --> 6.221 | Weights_l2 --> 6985.927 | Lr --> 0.011 | Seconds_per_step --> 3.313 |
[2024-09-02 17:59:00,011][Main][INFO] - [train] Step 900 out of 65536 | Loss --> 5.408 | Grad_l2 --> 1.465 | Weights_l2 --> 6980.026 | Lr --> 0.011 | Seconds_per_step --> 3.308 |
[2024-09-02 18:04:30,275][Main][INFO] - [train] Step 1000 out of 65536 | Loss --> 5.311 | Grad_l2 --> 0.992 | Weights_l2 --> 6975.109 | Lr --> 0.011 | Seconds_per_step --> 3.303 |
[2024-09-02 18:10:01,468][Main][INFO] - [train] Step 1100 out of 65536 | Loss --> 5.241 | Grad_l2 --> 0.854 | Weights_l2 --> 6970.708 | Lr --> 0.011 | Seconds_per_step --> 3.312 |
[2024-09-02 18:15:33,362][Main][INFO] - [train] Step 1200 out of 65536 | Loss --> 5.180 | Grad_l2 --> 0.838 | Weights_l2 --> 6966.641 | Lr --> 0.011 | Seconds_per_step --> 3.319 |
[2024-09-02 18:21:03,902][Main][INFO] - [train] Step 1300 out of 65536 | Loss --> 5.126 | Grad_l2 --> 0.764 | Weights_l2 --> 6962.789 | Lr --> 0.011 | Seconds_per_step --> 3.305 |
[2024-09-02 18:26:35,349][Main][INFO] - [train] Step 1400 out of 65536 | Loss --> 5.088 | Grad_l2 --> 0.744 | Weights_l2 --> 6959.146 | Lr --> 0.011 | Seconds_per_step --> 3.314 |
[2024-09-02 18:32:06,048][Main][INFO] - [train] Step 1500 out of 65536 | Loss --> 5.046 | Grad_l2 --> 0.702 | Weights_l2 --> 6955.673 | Lr --> 0.012 | Seconds_per_step --> 3.307 |
[2024-09-02 18:37:37,903][Main][INFO] - [train] Step 1600 out of 65536 | Loss --> 5.007 | Grad_l2 --> 0.691 | Weights_l2 --> 6952.523 | Lr --> 0.012 | Seconds_per_step --> 3.319 |
[2024-09-02 18:43:09,723][Main][INFO] - [train] Step 1700 out of 65536 | Loss --> 4.973 | Grad_l2 --> 0.673 | Weights_l2 --> 6949.412 | Lr --> 0.012 | Seconds_per_step --> 3.318 |
[2024-09-02 18:48:40,909][Main][INFO] - [train] Step 1800 out of 65536 | Loss --> 4.943 | Grad_l2 --> 0.671 | Weights_l2 --> 6946.498 | Lr --> 0.012 | Seconds_per_step --> 3.312 |
[2024-09-02 18:54:13,524][Main][INFO] - [train] Step 1900 out of 65536 | Loss --> 4.929 | Grad_l2 --> 0.668 | Weights_l2 --> 6943.795 | Lr --> 0.012 | Seconds_per_step --> 3.326 |
[2024-09-02 18:59:45,500][Main][INFO] - [train] Step 2000 out of 65536 | Loss --> 4.894 | Grad_l2 --> 0.665 | Weights_l2 --> 6941.241 | Lr --> 0.012 | Seconds_per_step --> 3.320 |
[2024-09-02 19:05:16,395][Main][INFO] - [train] Step 2100 out of 65536 | Loss --> 4.881 | Grad_l2 --> 0.713 | Weights_l2 --> 6938.861 | Lr --> 0.012 | Seconds_per_step --> 3.309 |
[2024-09-02 19:10:48,520][Main][INFO] - [train] Step 2200 out of 65536 | Loss --> 4.853 | Grad_l2 --> 0.653 | Weights_l2 --> 6936.551 | Lr --> 0.012 | Seconds_per_step --> 3.321 |
[2024-09-02 19:16:19,278][Main][INFO] - [train] Step 2300 out of 65536 | Loss --> 4.829 | Grad_l2 --> 0.646 | Weights_l2 --> 6934.357 | Lr --> 0.012 | Seconds_per_step --> 3.308 |
[2024-09-02 19:21:51,370][Main][INFO] - [train] Step 2400 out of 65536 | Loss --> 4.790 | Grad_l2 --> 0.620 | Weights_l2 --> 6932.338 | Lr --> 0.012 | Seconds_per_step --> 3.321 |
[2024-09-02 19:27:23,544][Main][INFO] - [train] Step 2500 out of 65536 | Loss --> 4.784 | Grad_l2 --> 0.643 | Weights_l2 --> 6930.395 | Lr --> 0.013 | Seconds_per_step --> 3.322 |
[2024-09-02 19:32:54,341][Main][INFO] - [train] Step 2600 out of 65536 | Loss --> 4.755 | Grad_l2 --> 0.623 | Weights_l2 --> 6928.543 | Lr --> 0.013 | Seconds_per_step --> 3.308 |
[2024-09-02 19:38:25,942][Main][INFO] - [train] Step 2700 out of 65536 | Loss --> 4.743 | Grad_l2 --> 0.636 | Weights_l2 --> 6926.944 | Lr --> 0.013 | Seconds_per_step --> 3.316 |
[2024-09-02 19:43:57,708][Main][INFO] - [train] Step 2800 out of 65536 | Loss --> 4.722 | Grad_l2 --> 0.590 | Weights_l2 --> 6925.379 | Lr --> 0.013 | Seconds_per_step --> 3.318 |
[2024-09-02 19:49:28,285][Main][INFO] - [train] Step 2900 out of 65536 | Loss --> 4.715 | Grad_l2 --> 0.622 | Weights_l2 --> 6924.007 | Lr --> 0.013 | Seconds_per_step --> 3.306 |
[2024-09-02 19:54:59,957][Main][INFO] - [train] Step 3000 out of 65536 | Loss --> 4.694 | Grad_l2 --> 0.652 | Weights_l2 --> 6922.709 | Lr --> 0.013 | Seconds_per_step --> 3.317 |
[2024-09-02 20:00:31,072][Main][INFO] - [train] Step 3100 out of 65536 | Loss --> 4.678 | Grad_l2 --> 0.614 | Weights_l2 --> 6921.561 | Lr --> 0.013 | Seconds_per_step --> 3.311 |
[2024-09-02 20:06:02,747][Main][INFO] - [train] Step 3200 out of 65536 | Loss --> 4.633 | Grad_l2 --> 0.610 | Weights_l2 --> 6920.463 | Lr --> 0.013 | Seconds_per_step --> 3.317 |
[2024-09-02 20:11:34,607][Main][INFO] - [train] Step 3300 out of 65536 | Loss --> 4.599 | Grad_l2 --> 0.638 | Weights_l2 --> 6919.642 | Lr --> 0.013 | Seconds_per_step --> 3.319 |
[2024-09-02 20:17:05,731][Main][INFO] - [train] Step 3400 out of 65536 | Loss --> 4.549 | Grad_l2 --> 0.774 | Weights_l2 --> 6919.263 | Lr --> 0.013 | Seconds_per_step --> 3.311 |
[2024-09-02 20:22:37,601][Main][INFO] - [train] Step 3500 out of 65536 | Loss --> 4.420 | Grad_l2 --> 0.934 | Weights_l2 --> 6918.974 | Lr --> 0.014 | Seconds_per_step --> 3.319 |
[2024-09-02 20:28:09,554][Main][INFO] - [train] Step 3600 out of 65536 | Loss --> 4.256 | Grad_l2 --> 0.763 | Weights_l2 --> 6919.477 | Lr --> 0.014 | Seconds_per_step --> 3.319 |
[2024-09-02 20:33:40,654][Main][INFO] - [train] Step 3700 out of 65536 | Loss --> 4.131 | Grad_l2 --> 0.657 | Weights_l2 --> 6920.705 | Lr --> 0.014 | Seconds_per_step --> 3.311 |
[2024-09-02 20:39:13,064][Main][INFO] - [train] Step 3800 out of 65536 | Loss --> 4.021 | Grad_l2 --> 0.709 | Weights_l2 --> 6922.188 | Lr --> 0.014 | Seconds_per_step --> 3.324 |
[2024-09-02 20:44:45,663][Main][INFO] - [train] Step 3900 out of 65536 | Loss --> 3.909 | Grad_l2 --> 0.637 | Weights_l2 --> 6923.666 | Lr --> 0.014 | Seconds_per_step --> 3.326 |
[2024-09-02 20:50:16,811][Main][INFO] - [train] Step 4000 out of 65536 | Loss --> 3.855 | Grad_l2 --> 1.013 | Weights_l2 --> 6923.778 | Lr --> 0.014 | Seconds_per_step --> 3.311 |
[2024-09-02 20:55:49,235][Main][INFO] - [train] Step 4100 out of 65536 | Loss --> 3.770 | Grad_l2 --> 0.589 | Weights_l2 --> 6925.545 | Lr --> 0.014 | Seconds_per_step --> 3.324 |
[2024-09-02 21:01:20,500][Main][INFO] - [train] Step 4200 out of 65536 | Loss --> 3.710 | Grad_l2 --> 0.579 | Weights_l2 --> 6927.200 | Lr --> 0.014 | Seconds_per_step --> 3.313 |
[2024-09-02 21:06:53,406][Main][INFO] - [train] Step 4300 out of 65536 | Loss --> 3.651 | Grad_l2 --> 0.588 | Weights_l2 --> 6928.842 | Lr --> 0.014 | Seconds_per_step --> 3.329 |
[2024-09-02 21:12:26,298][Main][INFO] - [train] Step 4400 out of 65536 | Loss --> 3.614 | Grad_l2 --> 0.632 | Weights_l2 --> 6930.597 | Lr --> 0.014 | Seconds_per_step --> 3.329 |
[2024-09-02 21:17:57,623][Main][INFO] - [train] Step 4500 out of 65536 | Loss --> 3.582 | Grad_l2 --> 0.884 | Weights_l2 --> 6931.569 | Lr --> 0.015 | Seconds_per_step --> 3.313 |
[2024-09-02 21:23:30,116][Main][INFO] - [train] Step 4600 out of 65536 | Loss --> 3.527 | Grad_l2 --> 0.582 | Weights_l2 --> 6933.783 | Lr --> 0.015 | Seconds_per_step --> 3.325 |
[2024-09-02 21:29:02,417][Main][INFO] - [train] Step 4700 out of 65536 | Loss --> 3.476 | Grad_l2 --> 0.549 | Weights_l2 --> 6935.959 | Lr --> 0.015 | Seconds_per_step --> 3.323 |
[2024-09-02 21:34:33,535][Main][INFO] - [train] Step 4800 out of 65536 | Loss --> 3.430 | Grad_l2 --> 0.551 | Weights_l2 --> 6938.224 | Lr --> 0.015 | Seconds_per_step --> 3.311 |
[2024-09-02 21:40:05,905][Main][INFO] - [train] Step 4900 out of 65536 | Loss --> 3.395 | Grad_l2 --> 0.550 | Weights_l2 --> 6940.617 | Lr --> 0.015 | Seconds_per_step --> 3.324 |
[2024-09-02 21:45:36,944][Main][INFO] - [train] Step 5000 out of 65536 | Loss --> 3.366 | Grad_l2 --> 0.546 | Weights_l2 --> 6943.230 | Lr --> 0.015 | Seconds_per_step --> 3.310 |
[2024-09-02 21:45:36,947][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-5000
[2024-09-02 21:45:36,954][accelerate.utils.other][WARNING] - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
[2024-09-02 21:45:44,182][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-5000/model.safetensors
[2024-09-02 21:45:54,822][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-5000/optimizer.bin
[2024-09-02 21:45:54,827][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-5000/scheduler.bin
[2024-09-02 21:45:54,828][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-5000/sampler.bin
[2024-09-02 21:45:54,829][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-5000/sampler_1.bin
[2024-09-02 21:45:54,835][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-5000/random_states_0.pkl
[2024-09-02 21:51:26,402][Main][INFO] - [train] Step 5100 out of 65536 | Loss --> 3.302 | Grad_l2 --> 0.541 | Weights_l2 --> 6946.278 | Lr --> 0.015 | Seconds_per_step --> 3.495 |
[2024-09-02 21:56:58,321][Main][INFO] - [train] Step 5200 out of 65536 | Loss --> 3.248 | Grad_l2 --> 0.556 | Weights_l2 --> 6950.060 | Lr --> 0.015 | Seconds_per_step --> 3.319 |
[2024-09-02 22:02:29,452][Main][INFO] - [train] Step 5300 out of 65536 | Loss --> 3.194 | Grad_l2 --> 0.566 | Weights_l2 --> 6954.461 | Lr --> 0.015 | Seconds_per_step --> 3.311 |
[2024-09-02 22:08:01,594][Main][INFO] - [train] Step 5400 out of 65536 | Loss --> 3.144 | Grad_l2 --> 0.548 | Weights_l2 --> 6959.061 | Lr --> 0.015 | Seconds_per_step --> 3.321 |
[2024-09-02 22:13:33,473][Main][INFO] - [train] Step 5500 out of 65536 | Loss --> 3.099 | Grad_l2 --> 0.546 | Weights_l2 --> 6963.676 | Lr --> 0.016 | Seconds_per_step --> 3.319 |
[2024-09-02 22:19:04,763][Main][INFO] - [train] Step 5600 out of 65536 | Loss --> 3.044 | Grad_l2 --> 0.531 | Weights_l2 --> 6968.055 | Lr --> 0.016 | Seconds_per_step --> 3.313 |
[2024-09-02 22:24:37,024][Main][INFO] - [train] Step 5700 out of 65536 | Loss --> 3.023 | Grad_l2 --> 0.528 | Weights_l2 --> 6972.595 | Lr --> 0.016 | Seconds_per_step --> 3.323 |
[2024-09-02 22:30:08,010][Main][INFO] - [train] Step 5800 out of 65536 | Loss --> 2.999 | Grad_l2 --> 0.529 | Weights_l2 --> 6977.095 | Lr --> 0.016 | Seconds_per_step --> 3.310 |
[2024-09-02 22:35:40,260][Main][INFO] - [train] Step 5900 out of 65536 | Loss --> 2.953 | Grad_l2 --> 0.516 | Weights_l2 --> 6981.522 | Lr --> 0.016 | Seconds_per_step --> 3.322 |
[2024-09-02 22:41:12,494][Main][INFO] - [train] Step 6000 out of 65536 | Loss --> 2.924 | Grad_l2 --> 0.514 | Weights_l2 --> 6985.860 | Lr --> 0.016 | Seconds_per_step --> 3.322 |
[2024-09-02 22:46:43,439][Main][INFO] - [train] Step 6100 out of 65536 | Loss --> 2.904 | Grad_l2 --> 0.500 | Weights_l2 --> 6990.209 | Lr --> 0.016 | Seconds_per_step --> 3.309 |
[2024-09-02 22:52:15,361][Main][INFO] - [train] Step 6200 out of 65536 | Loss --> 2.885 | Grad_l2 --> 0.499 | Weights_l2 --> 6994.575 | Lr --> 0.016 | Seconds_per_step --> 3.319 |
[2024-09-02 22:57:47,371][Main][INFO] - [train] Step 6300 out of 65536 | Loss --> 2.860 | Grad_l2 --> 0.496 | Weights_l2 --> 6998.855 | Lr --> 0.016 | Seconds_per_step --> 3.320 |
[2024-09-02 23:03:18,243][Main][INFO] - [train] Step 6400 out of 65536 | Loss --> 2.828 | Grad_l2 --> 0.486 | Weights_l2 --> 7003.354 | Lr --> 0.016 | Seconds_per_step --> 3.309 |
[2024-09-02 23:08:50,256][Main][INFO] - [train] Step 6500 out of 65536 | Loss --> 2.823 | Grad_l2 --> 0.491 | Weights_l2 --> 7007.772 | Lr --> 0.017 | Seconds_per_step --> 3.320 |
[2024-09-02 23:14:21,254][Main][INFO] - [train] Step 6600 out of 65536 | Loss --> 2.801 | Grad_l2 --> 0.572 | Weights_l2 --> 7012.034 | Lr --> 0.017 | Seconds_per_step --> 3.310 |
[2024-09-02 23:19:53,383][Main][INFO] - [train] Step 6700 out of 65536 | Loss --> 2.776 | Grad_l2 --> 0.473 | Weights_l2 --> 7016.624 | Lr --> 0.017 | Seconds_per_step --> 3.321 |
[2024-09-02 23:25:25,894][Main][INFO] - [train] Step 6800 out of 65536 | Loss --> 2.764 | Grad_l2 --> 0.489 | Weights_l2 --> 7021.128 | Lr --> 0.017 | Seconds_per_step --> 3.325 |
[2024-09-02 23:30:56,990][Main][INFO] - [train] Step 6900 out of 65536 | Loss --> 2.754 | Grad_l2 --> 0.467 | Weights_l2 --> 7025.909 | Lr --> 0.017 | Seconds_per_step --> 3.311 |
[2024-09-02 23:36:28,837][Main][INFO] - [train] Step 7000 out of 65536 | Loss --> 2.716 | Grad_l2 --> 0.469 | Weights_l2 --> 7030.583 | Lr --> 0.017 | Seconds_per_step --> 3.318 |
[2024-09-02 23:42:00,897][Main][INFO] - [train] Step 7100 out of 65536 | Loss --> 2.706 | Grad_l2 --> 0.470 | Weights_l2 --> 7035.338 | Lr --> 0.017 | Seconds_per_step --> 3.321 |
[2024-09-02 23:47:31,913][Main][INFO] - [train] Step 7200 out of 65536 | Loss --> 2.685 | Grad_l2 --> 0.460 | Weights_l2 --> 7040.107 | Lr --> 0.017 | Seconds_per_step --> 3.310 |
[2024-09-02 23:53:04,028][Main][INFO] - [train] Step 7300 out of 65536 | Loss --> 2.675 | Grad_l2 --> 0.462 | Weights_l2 --> 7044.921 | Lr --> 0.017 | Seconds_per_step --> 3.321 |
[2024-09-02 23:58:35,224][Main][INFO] - [train] Step 7400 out of 65536 | Loss --> 2.670 | Grad_l2 --> 0.473 | Weights_l2 --> 7049.994 | Lr --> 0.017 | Seconds_per_step --> 3.312 |
[2024-09-03 00:04:07,495][Main][INFO] - [train] Step 7500 out of 65536 | Loss --> 2.653 | Grad_l2 --> 0.452 | Weights_l2 --> 7055.123 | Lr --> 0.018 | Seconds_per_step --> 3.323 |
[2024-09-03 00:09:39,687][Main][INFO] - [train] Step 7600 out of 65536 | Loss --> 2.644 | Grad_l2 --> 0.499 | Weights_l2 --> 7060.263 | Lr --> 0.018 | Seconds_per_step --> 3.322 |
[2024-09-03 00:15:11,125][Main][INFO] - [train] Step 7700 out of 65536 | Loss --> 2.619 | Grad_l2 --> 0.451 | Weights_l2 --> 7065.593 | Lr --> 0.018 | Seconds_per_step --> 3.314 |
[2024-09-03 00:20:43,656][Main][INFO] - [train] Step 7800 out of 65536 | Loss --> 2.611 | Grad_l2 --> 0.444 | Weights_l2 --> 7071.016 | Lr --> 0.018 | Seconds_per_step --> 3.325 |
[2024-09-03 00:26:15,825][Main][INFO] - [train] Step 7900 out of 65536 | Loss --> 2.593 | Grad_l2 --> 0.444 | Weights_l2 --> 7076.338 | Lr --> 0.018 | Seconds_per_step --> 3.322 |
[2024-09-03 00:31:46,986][Main][INFO] - [train] Step 8000 out of 65536 | Loss --> 2.591 | Grad_l2 --> 0.707 | Weights_l2 --> 7081.619 | Lr --> 0.018 | Seconds_per_step --> 3.312 |
[2024-09-03 00:37:19,240][Main][INFO] - [train] Step 8100 out of 65536 | Loss --> 2.583 | Grad_l2 --> 0.504 | Weights_l2 --> 7087.303 | Lr --> 0.018 | Seconds_per_step --> 3.323 |
[2024-09-03 00:42:50,497][Main][INFO] - [train] Step 8200 out of 65536 | Loss --> 2.572 | Grad_l2 --> 0.435 | Weights_l2 --> 7092.976 | Lr --> 0.018 | Seconds_per_step --> 3.313 |
[2024-09-03 00:48:22,669][Main][INFO] - [train] Step 8300 out of 65536 | Loss --> 2.550 | Grad_l2 --> 0.444 | Weights_l2 --> 7098.242 | Lr --> 0.018 | Seconds_per_step --> 3.322 |
[2024-09-03 00:53:54,859][Main][INFO] - [train] Step 8400 out of 65536 | Loss --> 2.533 | Grad_l2 --> 0.424 | Weights_l2 --> 7103.870 | Lr --> 0.018 | Seconds_per_step --> 3.322 |
[2024-09-03 00:59:25,959][Main][INFO] - [train] Step 8500 out of 65536 | Loss --> 2.520 | Grad_l2 --> 0.415 | Weights_l2 --> 7109.426 | Lr --> 0.019 | Seconds_per_step --> 3.311 |
[2024-09-03 01:04:58,102][Main][INFO] - [train] Step 8600 out of 65536 | Loss --> 2.512 | Grad_l2 --> 0.445 | Weights_l2 --> 7115.243 | Lr --> 0.019 | Seconds_per_step --> 3.321 |
[2024-09-03 01:10:30,308][Main][INFO] - [train] Step 8700 out of 65536 | Loss --> 2.497 | Grad_l2 --> 0.416 | Weights_l2 --> 7120.917 | Lr --> 0.019 | Seconds_per_step --> 3.322 |
[2024-09-03 01:16:01,412][Main][INFO] - [train] Step 8800 out of 65536 | Loss --> 2.503 | Grad_l2 --> 0.453 | Weights_l2 --> 7127.067 | Lr --> 0.019 | Seconds_per_step --> 3.311 |
[2024-09-03 01:21:33,679][Main][INFO] - [train] Step 8900 out of 65536 | Loss --> 2.498 | Grad_l2 --> 0.519 | Weights_l2 --> 7133.268 | Lr --> 0.019 | Seconds_per_step --> 3.323 |
[2024-09-03 01:27:05,633][Main][INFO] - [train] Step 9000 out of 65536 | Loss --> 2.480 | Grad_l2 --> 0.413 | Weights_l2 --> 7139.449 | Lr --> 0.019 | Seconds_per_step --> 3.320 |
[2024-09-03 01:32:36,839][Main][INFO] - [train] Step 9100 out of 65536 | Loss --> 2.488 | Grad_l2 --> 0.429 | Weights_l2 --> 7145.663 | Lr --> 0.019 | Seconds_per_step --> 3.312 |
[2024-09-03 01:38:09,090][Main][INFO] - [train] Step 9200 out of 65536 | Loss --> 2.458 | Grad_l2 --> 0.651 | Weights_l2 --> 7151.751 | Lr --> 0.019 | Seconds_per_step --> 3.322 |
[2024-09-03 01:43:40,183][Main][INFO] - [train] Step 9300 out of 65536 | Loss --> 2.481 | Grad_l2 --> 0.667 | Weights_l2 --> 7157.979 | Lr --> 0.019 | Seconds_per_step --> 3.311 |
[2024-09-03 01:49:12,323][Main][INFO] - [train] Step 9400 out of 65536 | Loss --> 2.454 | Grad_l2 --> 0.500 | Weights_l2 --> 7164.722 | Lr --> 0.019 | Seconds_per_step --> 3.321 |
[2024-09-03 01:54:44,360][Main][INFO] - [train] Step 9500 out of 65536 | Loss --> 2.434 | Grad_l2 --> 0.434 | Weights_l2 --> 7171.100 | Lr --> 0.020 | Seconds_per_step --> 3.320 |
[2024-09-03 02:00:15,384][Main][INFO] - [train] Step 9600 out of 65536 | Loss --> 2.430 | Grad_l2 --> 0.459 | Weights_l2 --> 7177.669 | Lr --> 0.020 | Seconds_per_step --> 3.310 |
[2024-09-03 02:05:47,653][Main][INFO] - [train] Step 9700 out of 65536 | Loss --> 2.435 | Grad_l2 --> 0.458 | Weights_l2 --> 7184.407 | Lr --> 0.020 | Seconds_per_step --> 3.323 |
[2024-09-03 02:11:19,839][Main][INFO] - [train] Step 9800 out of 65536 | Loss --> 2.431 | Grad_l2 --> 0.796 | Weights_l2 --> 7190.992 | Lr --> 0.020 | Seconds_per_step --> 3.322 |
[2024-09-03 02:16:50,929][Main][INFO] - [train] Step 9900 out of 65536 | Loss --> 2.403 | Grad_l2 --> 0.782 | Weights_l2 --> 7197.863 | Lr --> 0.020 | Seconds_per_step --> 3.311 |
[2024-09-03 02:22:23,236][Main][INFO] - [train] Step 10000 out of 65536 | Loss --> 2.445 | Grad_l2 --> 1.140 | Weights_l2 --> 7204.637 | Lr --> 0.020 | Seconds_per_step --> 3.323 |
[2024-09-03 02:22:23,238][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-10000
[2024-09-03 02:22:23,245][accelerate.utils.other][WARNING] - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
[2024-09-03 02:22:29,395][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-10000/model.safetensors
[2024-09-03 02:22:38,780][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-10000/optimizer.bin
[2024-09-03 02:22:38,784][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-10000/scheduler.bin
[2024-09-03 02:22:38,784][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-10000/sampler.bin
[2024-09-03 02:22:38,785][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-10000/sampler_1.bin
[2024-09-03 02:22:38,790][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-10000/random_states_0.pkl
[2024-09-03 02:28:09,713][Main][INFO] - [train] Step 10100 out of 65536 | Loss --> 2.441 | Grad_l2 --> 1.063 | Weights_l2 --> 7212.671 | Lr --> 0.020 | Seconds_per_step --> 3.465 |
[2024-09-03 02:33:42,096][Main][INFO] - [train] Step 10200 out of 65536 | Loss --> 2.421 | Grad_l2 --> 1.135 | Weights_l2 --> 7219.539 | Lr --> 0.020 | Seconds_per_step --> 3.324 |
[2024-09-03 02:39:14,331][Main][INFO] - [train] Step 10300 out of 65536 | Loss --> 2.408 | Grad_l2 --> 1.377 | Weights_l2 --> 7226.397 | Lr --> 0.020 | Seconds_per_step --> 3.322 |
[2024-09-03 02:44:45,309][Main][INFO] - [train] Step 10400 out of 65536 | Loss --> 2.385 | Grad_l2 --> 1.568 | Weights_l2 --> 7232.973 | Lr --> 0.020 | Seconds_per_step --> 3.310 |
[2024-09-03 02:50:17,356][Main][INFO] - [train] Step 10500 out of 65536 | Loss --> 2.383 | Grad_l2 --> 5.267 | Weights_l2 --> 7238.788 | Lr --> 0.020 | Seconds_per_step --> 3.320 |
[2024-09-03 02:55:49,191][Main][INFO] - [train] Step 10600 out of 65536 | Loss --> 51.695 | Grad_l2 --> 2316.455 | Weights_l2 --> 7233.899 | Lr --> 0.020 | Seconds_per_step --> 3.318 |
[2024-09-03 03:01:20,350][Main][INFO] - [train] Step 10700 out of 65536 | Loss --> 19.189 | Grad_l2 --> 206.407 | Weights_l2 --> 7221.798 | Lr --> 0.020 | Seconds_per_step --> 3.312 |
[2024-09-03 03:06:52,743][Main][INFO] - [train] Step 10800 out of 65536 | Loss --> 6.908 | Grad_l2 --> 26.249 | Weights_l2 --> 7210.980 | Lr --> 0.020 | Seconds_per_step --> 3.324 |
[2024-09-03 03:12:23,733][Main][INFO] - [train] Step 10900 out of 65536 | Loss --> 42.736 | Grad_l2 --> 1292.659 | Weights_l2 --> 7206.464 | Lr --> 0.020 | Seconds_per_step --> 3.310 |
|