Commit
·
93f5d24
1
Parent(s):
7ba1e26
new data
Browse files- logs/main_log.txt +196 -0
logs/main_log.txt
CHANGED
@@ -96923,3 +96923,199 @@ time (ms)
|
|
96923 |
time (ms)
|
96924 |
iteration 1998/ 292968 | consumed samples: 4091904 | consumed tokens: 471121920 | elapsed time per iteration (ms): 106021.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.901303E+00 | loss scale: 32768.0 | grad norm: 15104.265 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96925 |
time (ms)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
96923 |
time (ms)
|
96924 |
iteration 1998/ 292968 | consumed samples: 4091904 | consumed tokens: 471121920 | elapsed time per iteration (ms): 106021.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.901303E+00 | loss scale: 32768.0 | grad norm: 15104.265 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96925 |
time (ms)
|
96926 |
+
iteration 1999/ 292968 | consumed samples: 4093952 | consumed tokens: 471465984 | elapsed time per iteration (ms): 105576.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.945539E+00 | loss scale: 32768.0 | grad norm: 22390.801 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96927 |
+
time (ms)
|
96928 |
+
[2021-10-27 07:53:42,718] [INFO] [logging.py:68:log_dist] [Rank 0] step=2000, skipped=0, lr=[9.99992373584452e-05, 9.99992373584452e-05], mom=[(0.9, 0.95), (0.9, 0.95)]
|
96929 |
+
steps: 2000 loss: 3.9148 iter time (s): 0.053 samples/sec: 38744.481
|
96930 |
+
iteration 2000/ 292968 | consumed samples: 4096000 | consumed tokens: 471810048 | elapsed time per iteration (ms): 105723.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.914763E+00 | loss scale: 65536.0 | grad norm: 19113.174 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96931 |
+
time (ms)
|
96932 |
+
iteration 2001/ 292968 | consumed samples: 4098048 | consumed tokens: 472154112 | elapsed time per iteration (ms): 105032.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.919772E+00 | loss scale: 65536.0 | grad norm: 45665.550 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96933 |
+
time (ms)
|
96934 |
+
iteration 2002/ 292968 | consumed samples: 4100096 | consumed tokens: 472498176 | elapsed time per iteration (ms): 104883.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.920336E+00 | loss scale: 65536.0 | grad norm: 80367.931 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96935 |
+
time (ms)
|
96936 |
+
iteration 2003/ 292968 | consumed samples: 4102144 | consumed tokens: 472842240 | elapsed time per iteration (ms): 106158.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.942242E+00 | loss scale: 65536.0 | grad norm: 46148.047 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96937 |
+
time (ms)
|
96938 |
+
iteration 2004/ 292968 | consumed samples: 4104192 | consumed tokens: 473186304 | elapsed time per iteration (ms): 107745.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.898877E+00 | loss scale: 65536.0 | grad norm: 36023.288 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96939 |
+
time (ms)
|
96940 |
+
iteration 2005/ 292968 | consumed samples: 4106240 | consumed tokens: 473530368 | elapsed time per iteration (ms): 104817.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.943701E+00 | loss scale: 65536.0 | grad norm: 38876.683 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96941 |
+
time (ms)
|
96942 |
+
iteration 2006/ 292968 | consumed samples: 4108288 | consumed tokens: 473874432 | elapsed time per iteration (ms): 106505.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.931247E+00 | loss scale: 65536.0 | grad norm: 33470.765 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96943 |
+
time (ms)
|
96944 |
+
iteration 2007/ 292968 | consumed samples: 4110336 | consumed tokens: 474218496 | elapsed time per iteration (ms): 106419.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.964560E+00 | loss scale: 65536.0 | grad norm: 29687.656 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96945 |
+
time (ms)
|
96946 |
+
iteration 2008/ 292968 | consumed samples: 4112384 | consumed tokens: 474562560 | elapsed time per iteration (ms): 107192.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.907559E+00 | loss scale: 65536.0 | grad norm: 39289.522 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96947 |
+
time (ms)
|
96948 |
+
iteration 2009/ 292968 | consumed samples: 4114432 | consumed tokens: 474906624 | elapsed time per iteration (ms): 104955.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.923905E+00 | loss scale: 65536.0 | grad norm: 35524.350 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96949 |
+
time (ms)
|
96950 |
+
iteration 2010/ 292968 | consumed samples: 4116480 | consumed tokens: 475250688 | elapsed time per iteration (ms): 108872.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.916351E+00 | loss scale: 65536.0 | grad norm: 24785.216 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96951 |
+
time (ms)
|
96952 |
+
iteration 2011/ 292968 | consumed samples: 4118528 | consumed tokens: 475594752 | elapsed time per iteration (ms): 104228.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.904333E+00 | loss scale: 65536.0 | grad norm: 39590.400 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96953 |
+
time (ms)
|
96954 |
+
iteration 2012/ 292968 | consumed samples: 4120576 | consumed tokens: 475938816 | elapsed time per iteration (ms): 104741.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.935276E+00 | loss scale: 65536.0 | grad norm: 37903.947 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96955 |
+
time (ms)
|
96956 |
+
iteration 2013/ 292968 | consumed samples: 4122624 | consumed tokens: 476282880 | elapsed time per iteration (ms): 106897.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.902785E+00 | loss scale: 65536.0 | grad norm: 20403.308 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96957 |
+
time (ms)
|
96958 |
+
iteration 2014/ 292968 | consumed samples: 4124672 | consumed tokens: 476626944 | elapsed time per iteration (ms): 105250.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.913821E+00 | loss scale: 65536.0 | grad norm: 23439.287 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96959 |
+
time (ms)
|
96960 |
+
iteration 2015/ 292968 | consumed samples: 4126720 | consumed tokens: 476971008 | elapsed time per iteration (ms): 107340.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.909118E+00 | loss scale: 65536.0 | grad norm: 25155.083 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96961 |
+
time (ms)
|
96962 |
+
iteration 2016/ 292968 | consumed samples: 4128768 | consumed tokens: 477315072 | elapsed time per iteration (ms): 105778.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.910501E+00 | loss scale: 65536.0 | grad norm: 23160.791 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96963 |
+
time (ms)
|
96964 |
+
iteration 2017/ 292968 | consumed samples: 4130816 | consumed tokens: 477659136 | elapsed time per iteration (ms): 104080.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.925369E+00 | loss scale: 65536.0 | grad norm: 24046.482 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96965 |
+
time (ms)
|
96966 |
+
iteration 2018/ 292968 | consumed samples: 4132864 | consumed tokens: 478003200 | elapsed time per iteration (ms): 107274.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.888488E+00 | loss scale: 65536.0 | grad norm: 25188.690 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96967 |
+
time (ms)
|
96968 |
+
iteration 2019/ 292968 | consumed samples: 4134912 | consumed tokens: 478347264 | elapsed time per iteration (ms): 105437.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.891493E+00 | loss scale: 65536.0 | grad norm: 23830.177 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96969 |
+
time (ms)
|
96970 |
+
iteration 2020/ 292968 | consumed samples: 4136960 | consumed tokens: 478691328 | elapsed time per iteration (ms): 103301.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.916592E+00 | loss scale: 65536.0 | grad norm: 32223.798 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96971 |
+
time (ms)
|
96972 |
+
iteration 2021/ 292968 | consumed samples: 4139008 | consumed tokens: 479035392 | elapsed time per iteration (ms): 108652.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.889750E+00 | loss scale: 65536.0 | grad norm: 40872.900 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96973 |
+
time (ms)
|
96974 |
+
iteration 2022/ 292968 | consumed samples: 4141056 | consumed tokens: 479379456 | elapsed time per iteration (ms): 107504.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.892524E+00 | loss scale: 65536.0 | grad norm: 28959.533 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96975 |
+
time (ms)
|
96976 |
+
iteration 2023/ 292968 | consumed samples: 4143104 | consumed tokens: 479723520 | elapsed time per iteration (ms): 105937.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.947634E+00 | loss scale: 65536.0 | grad norm: 28395.060 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96977 |
+
time (ms)
|
96978 |
+
iteration 2024/ 292968 | consumed samples: 4145152 | consumed tokens: 480067584 | elapsed time per iteration (ms): 105944.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.909609E+00 | loss scale: 65536.0 | grad norm: 25389.003 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96979 |
+
time (ms)
|
96980 |
+
iteration 2025/ 292968 | consumed samples: 4147200 | consumed tokens: 480411648 | elapsed time per iteration (ms): 105717.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.912829E+00 | loss scale: 65536.0 | grad norm: 23156.778 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96981 |
+
time (ms)
|
96982 |
+
iteration 2026/ 292968 | consumed samples: 4149248 | consumed tokens: 480755712 | elapsed time per iteration (ms): 106132.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.903689E+00 | loss scale: 65536.0 | grad norm: 32742.610 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96983 |
+
time (ms)
|
96984 |
+
iteration 2027/ 292968 | consumed samples: 4151296 | consumed tokens: 481099776 | elapsed time per iteration (ms): 107458.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.886243E+00 | loss scale: 65536.0 | grad norm: 31171.176 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96985 |
+
time (ms)
|
96986 |
+
iteration 2028/ 292968 | consumed samples: 4153344 | consumed tokens: 481443840 | elapsed time per iteration (ms): 105627.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.913125E+00 | loss scale: 65536.0 | grad norm: 25790.752 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96987 |
+
time (ms)
|
96988 |
+
saving checkpoint at iteration 2028 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints
|
96989 |
+
[2021-10-27 08:43:15,456] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/mp_rank_00_model_states.pt
|
96990 |
+
[2021-10-27 08:43:15,754] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/mp_rank_01_model_states.pt
|
96991 |
+
[2021-10-27 08:43:28,422] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_77_optim_states.pt
|
96992 |
+
[2021-10-27 08:43:28,476] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_05_optim_states.pt
|
96993 |
+
[2021-10-27 08:43:28,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_116_optim_states.pt
|
96994 |
+
[2021-10-27 08:43:28,581] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_76_optim_states.pt
|
96995 |
+
[2021-10-27 08:43:28,628] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_112_optim_states.pt
|
96996 |
+
[2021-10-27 08:43:28,687] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_84_optim_states.pt
|
96997 |
+
[2021-10-27 08:43:28,694] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_123_optim_states.pt
|
96998 |
+
[2021-10-27 08:43:28,711] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_115_optim_states.pt
|
96999 |
+
[2021-10-27 08:43:28,745] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_08_optim_states.pt
|
97000 |
+
[2021-10-27 08:43:28,757] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_111_optim_states.pt
|
97001 |
+
[2021-10-27 08:43:28,771] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_22_optim_states.pt
|
97002 |
+
[2021-10-27 08:43:28,789] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_87_optim_states.pt
|
97003 |
+
[2021-10-27 08:43:28,791] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_73_optim_states.pt
|
97004 |
+
[2021-10-27 08:43:28,843] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_07_optim_states.pt
|
97005 |
+
[2021-10-27 08:43:28,844] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_10_optim_states.pt
|
97006 |
+
[2021-10-27 08:43:28,880] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_15_optim_states.pt
|
97007 |
+
[2021-10-27 08:43:28,896] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_107_optim_states.pt
|
97008 |
+
[2021-10-27 08:43:28,900] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_98_optim_states.pt
|
97009 |
+
[2021-10-27 08:43:28,906] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_106_optim_states.pt
|
97010 |
+
[2021-10-27 08:43:28,908] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_120_optim_states.pt
|
97011 |
+
[2021-10-27 08:43:28,926] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_72_optim_states.pt
|
97012 |
+
[2021-10-27 08:43:28,945] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_23_optim_states.pt
|
97013 |
+
[2021-10-27 08:43:28,959] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_118_optim_states.pt
|
97014 |
+
[2021-10-27 08:43:28,988] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_14_optim_states.pt
|
97015 |
+
[2021-10-27 08:43:29,003] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_97_optim_states.pt
|
97016 |
+
[2021-10-27 08:43:29,009] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_90_optim_states.pt
|
97017 |
+
[2021-10-27 08:43:29,070] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_80_optim_states.pt
|
97018 |
+
[2021-10-27 08:43:29,078] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_27_optim_states.pt
|
97019 |
+
[2021-10-27 08:43:29,118] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_88_optim_states.pt
|
97020 |
+
[2021-10-27 08:43:29,264] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_25_optim_states.pt
|
97021 |
+
[2021-10-27 08:43:29,345] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_81_optim_states.pt
|
97022 |
+
[2021-10-27 08:43:29,390] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_95_optim_states.pt
|
97023 |
+
[2021-10-27 08:43:29,475] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_71_optim_states.pt
|
97024 |
+
[2021-10-27 08:43:29,546] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_59_optim_states.pt
|
97025 |
+
[2021-10-27 08:43:29,610] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_119_optim_states.pt
|
97026 |
+
[2021-10-27 08:43:29,641] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_114_optim_states.pt
|
97027 |
+
[2021-10-27 08:43:29,655] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_108_optim_states.pt
|
97028 |
+
[2021-10-27 08:43:29,679] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_78_optim_states.pt
|
97029 |
+
[2021-10-27 08:43:29,686] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_58_optim_states.pt
|
97030 |
+
[2021-10-27 08:43:29,714] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_48_optim_states.pt
|
97031 |
+
[2021-10-27 08:43:29,718] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_11_optim_states.pt
|
97032 |
+
[2021-10-27 08:43:29,719] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_121_optim_states.pt
|
97033 |
+
[2021-10-27 08:43:29,731] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_79_optim_states.pt
|
97034 |
+
[2021-10-27 08:43:29,745] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_109_optim_states.pt
|
97035 |
+
[2021-10-27 08:43:29,802] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_63_optim_states.pt
|
97036 |
+
[2021-10-27 08:43:29,802] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_93_optim_states.pt
|
97037 |
+
[2021-10-27 08:43:29,845] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_86_optim_states.pt
|
97038 |
+
[2021-10-27 08:43:29,872] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_06_optim_states.pt
|
97039 |
+
[2021-10-27 08:43:29,881] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_83_optim_states.pt
|
97040 |
+
[2021-10-27 08:43:29,885] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_04_optim_states.pt
|
97041 |
+
[2021-10-27 08:43:29,900] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_110_optim_states.pt
|
97042 |
+
[2021-10-27 08:43:29,929] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_122_optim_states.pt
|
97043 |
+
[2021-10-27 08:43:29,931] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_75_optim_states.pt
|
97044 |
+
[2021-10-27 08:43:29,934] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_74_optim_states.pt
|
97045 |
+
[2021-10-27 08:43:29,944] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_91_optim_states.pt
|
97046 |
+
[2021-10-27 08:43:29,944] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_117_optim_states.pt
|
97047 |
+
[2021-10-27 08:43:29,969] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_37_optim_states.pt
|
97048 |
+
[2021-10-27 08:43:29,972] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_99_optim_states.pt
|
97049 |
+
[2021-10-27 08:43:30,003] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_68_optim_states.pt
|
97050 |
+
[2021-10-27 08:43:30,012] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_105_optim_states.pt
|
97051 |
+
[2021-10-27 08:43:30,018] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_67_optim_states.pt
|
97052 |
+
[2021-10-27 08:43:30,025] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_96_optim_states.pt
|
97053 |
+
[2021-10-27 08:43:30,027] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_26_optim_states.pt
|
97054 |
+
[2021-10-27 08:43:30,031] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_24_optim_states.pt
|
97055 |
+
[2021-10-27 08:43:30,032] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_94_optim_states.pt
|
97056 |
+
[2021-10-27 08:43:30,036] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_51_optim_states.pt
|
97057 |
+
[2021-10-27 08:43:30,039] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_89_optim_states.pt
|
97058 |
+
[2021-10-27 08:43:30,087] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_82_optim_states.pt
|
97059 |
+
[2021-10-27 08:43:30,108] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_103_optim_states.pt
|
97060 |
+
[2021-10-27 08:43:30,110] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_85_optim_states.pt
|
97061 |
+
[2021-10-27 08:43:30,138] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_104_optim_states.pt
|
97062 |
+
[2021-10-27 08:43:30,141] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_61_optim_states.pt
|
97063 |
+
[2021-10-27 08:43:30,158] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_36_optim_states.pt
|
97064 |
+
[2021-10-27 08:43:30,170] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_13_optim_states.pt
|
97065 |
+
[2021-10-27 08:43:30,171] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_09_optim_states.pt
|
97066 |
+
[2021-10-27 08:43:30,176] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_12_optim_states.pt
|
97067 |
+
[2021-10-27 08:43:30,196] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_30_optim_states.pt
|
97068 |
+
[2021-10-27 08:43:30,210] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_29_optim_states.pt
|
97069 |
+
[2021-10-27 08:43:30,237] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_64_optim_states.pt
|
97070 |
+
[2021-10-27 08:43:30,242] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_46_optim_states.pt
|
97071 |
+
[2021-10-27 08:43:30,269] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_32_optim_states.pt
|
97072 |
+
[2021-10-27 08:43:30,282] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_92_optim_states.pt
|
97073 |
+
[2021-10-27 08:43:30,286] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_52_optim_states.pt
|
97074 |
+
[2021-10-27 08:43:30,301] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_60_optim_states.pt
|
97075 |
+
[2021-10-27 08:43:30,339] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_40_optim_states.pt
|
97076 |
+
[2021-10-27 08:43:30,372] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_69_optim_states.pt
|
97077 |
+
[2021-10-27 08:43:30,401] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_47_optim_states.pt
|
97078 |
+
[2021-10-27 08:43:30,428] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_100_optim_states.pt
|
97079 |
+
[2021-10-27 08:43:30,475] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_113_optim_states.pt
|
97080 |
+
[2021-10-27 08:43:30,480] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_102_optim_states.pt
|
97081 |
+
[2021-10-27 08:43:30,510] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_33_optim_states.pt
|
97082 |
+
[2021-10-27 08:43:30,536] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_101_optim_states.pt
|
97083 |
+
[2021-10-27 08:43:30,561] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_57_optim_states.pt
|
97084 |
+
[2021-10-27 08:43:30,580] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_54_optim_states.pt
|
97085 |
+
[2021-10-27 08:43:30,583] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_65_optim_states.pt
|
97086 |
+
[2021-10-27 08:43:30,608] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_62_optim_states.pt
|
97087 |
+
[2021-10-27 08:43:30,676] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_66_optim_states.pt
|
97088 |
+
[2021-10-27 08:43:30,714] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_70_optim_states.pt
|
97089 |
+
[2021-10-27 08:43:30,716] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_43_optim_states.pt
|
97090 |
+
[2021-10-27 08:43:30,739] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_49_optim_states.pt
|
97091 |
+
[2021-10-27 08:43:30,745] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_41_optim_states.pt
|
97092 |
+
[2021-10-27 08:43:30,781] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_39_optim_states.pt
|
97093 |
+
[2021-10-27 08:43:30,815] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_55_optim_states.pt
|
97094 |
+
[2021-10-27 08:43:30,825] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_35_optim_states.pt
|
97095 |
+
[2021-10-27 08:43:30,868] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_56_optim_states.pt
|
97096 |
+
[2021-10-27 08:43:30,869] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_38_optim_states.pt
|
97097 |
+
[2021-10-27 08:43:30,875] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_42_optim_states.pt
|
97098 |
+
[2021-10-27 08:43:30,963] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_53_optim_states.pt
|
97099 |
+
[2021-10-27 08:43:30,964] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_50_optim_states.pt
|
97100 |
+
[2021-10-27 08:43:30,979] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_44_optim_states.pt
|
97101 |
+
[2021-10-27 08:43:30,991] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_45_optim_states.pt
|
97102 |
+
[2021-10-27 08:43:31,100] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_34_optim_states.pt
|
97103 |
+
[2021-10-27 08:43:31,240] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_126_optim_states.pt
|
97104 |
+
[2021-10-27 08:43:31,250] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_00_optim_states.pt
|
97105 |
+
[2021-10-27 08:43:31,381] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_03_optim_states.pt
|
97106 |
+
[2021-10-27 08:43:31,416] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_127_optim_states.pt
|
97107 |
+
[2021-10-27 08:43:32,638] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_124_optim_states.pt
|
97108 |
+
[2021-10-27 08:43:32,703] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_02_optim_states.pt
|
97109 |
+
[2021-10-27 08:43:32,817] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_125_optim_states.pt
|
97110 |
+
[2021-10-27 08:43:33,045] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_18_optim_states.pt
|
97111 |
+
[2021-10-27 08:43:33,181] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_01_optim_states.pt
|
97112 |
+
[2021-10-27 08:43:34,307] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_21_optim_states.pt
|
97113 |
+
[2021-10-27 08:43:34,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_20_optim_states.pt
|
97114 |
+
[2021-10-27 08:43:36,665] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_31_optim_states.pt
|
97115 |
+
[2021-10-27 08:43:37,955] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_16_optim_states.pt
|
97116 |
+
[2021-10-27 08:43:37,963] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_28_optim_states.pt
|
97117 |
+
[2021-10-27 08:43:38,123] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_19_optim_states.pt
|
97118 |
+
[2021-10-27 08:43:38,316] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_17_optim_states.pt
|
97119 |
+
successfully saved checkpoint at iteration 2028 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints
|
97120 |
+
time (ms) | save-checkpoint: 25851.77
|
97121 |
+
[exiting program after 1190.4112010161082 minutes] datetime: 2021-10-27 08:43:38
|