bigscience-bot commited on
Commit
93f5d24
1 Parent(s): 7ba1e26
Files changed (1) hide show
  1. logs/main_log.txt +196 -0
logs/main_log.txt CHANGED
@@ -96923,3 +96923,199 @@ time (ms)
96923
  time (ms)
96924
  iteration 1998/ 292968 | consumed samples: 4091904 | consumed tokens: 471121920 | elapsed time per iteration (ms): 106021.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.901303E+00 | loss scale: 32768.0 | grad norm: 15104.265 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96925
  time (ms)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
96923
  time (ms)
96924
  iteration 1998/ 292968 | consumed samples: 4091904 | consumed tokens: 471121920 | elapsed time per iteration (ms): 106021.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.901303E+00 | loss scale: 32768.0 | grad norm: 15104.265 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96925
  time (ms)
96926
+ iteration 1999/ 292968 | consumed samples: 4093952 | consumed tokens: 471465984 | elapsed time per iteration (ms): 105576.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.945539E+00 | loss scale: 32768.0 | grad norm: 22390.801 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96927
+ time (ms)
96928
+ [2021-10-27 07:53:42,718] [INFO] [logging.py:68:log_dist] [Rank 0] step=2000, skipped=0, lr=[9.99992373584452e-05, 9.99992373584452e-05], mom=[(0.9, 0.95), (0.9, 0.95)]
96929
+ steps: 2000 loss: 3.9148 iter time (s): 0.053 samples/sec: 38744.481
96930
+ iteration 2000/ 292968 | consumed samples: 4096000 | consumed tokens: 471810048 | elapsed time per iteration (ms): 105723.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.914763E+00 | loss scale: 65536.0 | grad norm: 19113.174 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96931
+ time (ms)
96932
+ iteration 2001/ 292968 | consumed samples: 4098048 | consumed tokens: 472154112 | elapsed time per iteration (ms): 105032.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.919772E+00 | loss scale: 65536.0 | grad norm: 45665.550 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96933
+ time (ms)
96934
+ iteration 2002/ 292968 | consumed samples: 4100096 | consumed tokens: 472498176 | elapsed time per iteration (ms): 104883.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.920336E+00 | loss scale: 65536.0 | grad norm: 80367.931 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96935
+ time (ms)
96936
+ iteration 2003/ 292968 | consumed samples: 4102144 | consumed tokens: 472842240 | elapsed time per iteration (ms): 106158.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.942242E+00 | loss scale: 65536.0 | grad norm: 46148.047 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96937
+ time (ms)
96938
+ iteration 2004/ 292968 | consumed samples: 4104192 | consumed tokens: 473186304 | elapsed time per iteration (ms): 107745.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.898877E+00 | loss scale: 65536.0 | grad norm: 36023.288 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96939
+ time (ms)
96940
+ iteration 2005/ 292968 | consumed samples: 4106240 | consumed tokens: 473530368 | elapsed time per iteration (ms): 104817.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.943701E+00 | loss scale: 65536.0 | grad norm: 38876.683 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96941
+ time (ms)
96942
+ iteration 2006/ 292968 | consumed samples: 4108288 | consumed tokens: 473874432 | elapsed time per iteration (ms): 106505.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.931247E+00 | loss scale: 65536.0 | grad norm: 33470.765 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96943
+ time (ms)
96944
+ iteration 2007/ 292968 | consumed samples: 4110336 | consumed tokens: 474218496 | elapsed time per iteration (ms): 106419.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.964560E+00 | loss scale: 65536.0 | grad norm: 29687.656 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96945
+ time (ms)
96946
+ iteration 2008/ 292968 | consumed samples: 4112384 | consumed tokens: 474562560 | elapsed time per iteration (ms): 107192.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.907559E+00 | loss scale: 65536.0 | grad norm: 39289.522 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96947
+ time (ms)
96948
+ iteration 2009/ 292968 | consumed samples: 4114432 | consumed tokens: 474906624 | elapsed time per iteration (ms): 104955.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.923905E+00 | loss scale: 65536.0 | grad norm: 35524.350 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96949
+ time (ms)
96950
+ iteration 2010/ 292968 | consumed samples: 4116480 | consumed tokens: 475250688 | elapsed time per iteration (ms): 108872.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.916351E+00 | loss scale: 65536.0 | grad norm: 24785.216 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96951
+ time (ms)
96952
+ iteration 2011/ 292968 | consumed samples: 4118528 | consumed tokens: 475594752 | elapsed time per iteration (ms): 104228.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.904333E+00 | loss scale: 65536.0 | grad norm: 39590.400 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96953
+ time (ms)
96954
+ iteration 2012/ 292968 | consumed samples: 4120576 | consumed tokens: 475938816 | elapsed time per iteration (ms): 104741.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.935276E+00 | loss scale: 65536.0 | grad norm: 37903.947 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96955
+ time (ms)
96956
+ iteration 2013/ 292968 | consumed samples: 4122624 | consumed tokens: 476282880 | elapsed time per iteration (ms): 106897.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.902785E+00 | loss scale: 65536.0 | grad norm: 20403.308 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96957
+ time (ms)
96958
+ iteration 2014/ 292968 | consumed samples: 4124672 | consumed tokens: 476626944 | elapsed time per iteration (ms): 105250.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.913821E+00 | loss scale: 65536.0 | grad norm: 23439.287 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96959
+ time (ms)
96960
+ iteration 2015/ 292968 | consumed samples: 4126720 | consumed tokens: 476971008 | elapsed time per iteration (ms): 107340.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.909118E+00 | loss scale: 65536.0 | grad norm: 25155.083 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96961
+ time (ms)
96962
+ iteration 2016/ 292968 | consumed samples: 4128768 | consumed tokens: 477315072 | elapsed time per iteration (ms): 105778.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.910501E+00 | loss scale: 65536.0 | grad norm: 23160.791 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96963
+ time (ms)
96964
+ iteration 2017/ 292968 | consumed samples: 4130816 | consumed tokens: 477659136 | elapsed time per iteration (ms): 104080.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.925369E+00 | loss scale: 65536.0 | grad norm: 24046.482 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96965
+ time (ms)
96966
+ iteration 2018/ 292968 | consumed samples: 4132864 | consumed tokens: 478003200 | elapsed time per iteration (ms): 107274.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.888488E+00 | loss scale: 65536.0 | grad norm: 25188.690 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96967
+ time (ms)
96968
+ iteration 2019/ 292968 | consumed samples: 4134912 | consumed tokens: 478347264 | elapsed time per iteration (ms): 105437.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.891493E+00 | loss scale: 65536.0 | grad norm: 23830.177 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96969
+ time (ms)
96970
+ iteration 2020/ 292968 | consumed samples: 4136960 | consumed tokens: 478691328 | elapsed time per iteration (ms): 103301.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.916592E+00 | loss scale: 65536.0 | grad norm: 32223.798 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96971
+ time (ms)
96972
+ iteration 2021/ 292968 | consumed samples: 4139008 | consumed tokens: 479035392 | elapsed time per iteration (ms): 108652.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.889750E+00 | loss scale: 65536.0 | grad norm: 40872.900 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96973
+ time (ms)
96974
+ iteration 2022/ 292968 | consumed samples: 4141056 | consumed tokens: 479379456 | elapsed time per iteration (ms): 107504.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.892524E+00 | loss scale: 65536.0 | grad norm: 28959.533 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96975
+ time (ms)
96976
+ iteration 2023/ 292968 | consumed samples: 4143104 | consumed tokens: 479723520 | elapsed time per iteration (ms): 105937.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.947634E+00 | loss scale: 65536.0 | grad norm: 28395.060 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96977
+ time (ms)
96978
+ iteration 2024/ 292968 | consumed samples: 4145152 | consumed tokens: 480067584 | elapsed time per iteration (ms): 105944.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.909609E+00 | loss scale: 65536.0 | grad norm: 25389.003 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96979
+ time (ms)
96980
+ iteration 2025/ 292968 | consumed samples: 4147200 | consumed tokens: 480411648 | elapsed time per iteration (ms): 105717.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.912829E+00 | loss scale: 65536.0 | grad norm: 23156.778 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96981
+ time (ms)
96982
+ iteration 2026/ 292968 | consumed samples: 4149248 | consumed tokens: 480755712 | elapsed time per iteration (ms): 106132.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.903689E+00 | loss scale: 65536.0 | grad norm: 32742.610 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96983
+ time (ms)
96984
+ iteration 2027/ 292968 | consumed samples: 4151296 | consumed tokens: 481099776 | elapsed time per iteration (ms): 107458.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.886243E+00 | loss scale: 65536.0 | grad norm: 31171.176 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96985
+ time (ms)
96986
+ iteration 2028/ 292968 | consumed samples: 4153344 | consumed tokens: 481443840 | elapsed time per iteration (ms): 105627.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.913125E+00 | loss scale: 65536.0 | grad norm: 25790.752 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
96987
+ time (ms)
96988
+ saving checkpoint at iteration 2028 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints
96989
+ [2021-10-27 08:43:15,456] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/mp_rank_00_model_states.pt
96990
+ [2021-10-27 08:43:15,754] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/mp_rank_01_model_states.pt
96991
+ [2021-10-27 08:43:28,422] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_77_optim_states.pt
96992
+ [2021-10-27 08:43:28,476] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_05_optim_states.pt
96993
+ [2021-10-27 08:43:28,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_116_optim_states.pt
96994
+ [2021-10-27 08:43:28,581] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_76_optim_states.pt
96995
+ [2021-10-27 08:43:28,628] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_112_optim_states.pt
96996
+ [2021-10-27 08:43:28,687] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_84_optim_states.pt
96997
+ [2021-10-27 08:43:28,694] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_123_optim_states.pt
96998
+ [2021-10-27 08:43:28,711] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_115_optim_states.pt
96999
+ [2021-10-27 08:43:28,745] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_08_optim_states.pt
97000
+ [2021-10-27 08:43:28,757] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_111_optim_states.pt
97001
+ [2021-10-27 08:43:28,771] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_22_optim_states.pt
97002
+ [2021-10-27 08:43:28,789] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_87_optim_states.pt
97003
+ [2021-10-27 08:43:28,791] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_73_optim_states.pt
97004
+ [2021-10-27 08:43:28,843] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_07_optim_states.pt
97005
+ [2021-10-27 08:43:28,844] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_10_optim_states.pt
97006
+ [2021-10-27 08:43:28,880] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_15_optim_states.pt
97007
+ [2021-10-27 08:43:28,896] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_107_optim_states.pt
97008
+ [2021-10-27 08:43:28,900] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_98_optim_states.pt
97009
+ [2021-10-27 08:43:28,906] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_106_optim_states.pt
97010
+ [2021-10-27 08:43:28,908] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_120_optim_states.pt
97011
+ [2021-10-27 08:43:28,926] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_72_optim_states.pt
97012
+ [2021-10-27 08:43:28,945] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_23_optim_states.pt
97013
+ [2021-10-27 08:43:28,959] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_118_optim_states.pt
97014
+ [2021-10-27 08:43:28,988] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_14_optim_states.pt
97015
+ [2021-10-27 08:43:29,003] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_97_optim_states.pt
97016
+ [2021-10-27 08:43:29,009] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_90_optim_states.pt
97017
+ [2021-10-27 08:43:29,070] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_80_optim_states.pt
97018
+ [2021-10-27 08:43:29,078] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_27_optim_states.pt
97019
+ [2021-10-27 08:43:29,118] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_88_optim_states.pt
97020
+ [2021-10-27 08:43:29,264] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_25_optim_states.pt
97021
+ [2021-10-27 08:43:29,345] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_81_optim_states.pt
97022
+ [2021-10-27 08:43:29,390] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_95_optim_states.pt
97023
+ [2021-10-27 08:43:29,475] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_71_optim_states.pt
97024
+ [2021-10-27 08:43:29,546] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_59_optim_states.pt
97025
+ [2021-10-27 08:43:29,610] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_119_optim_states.pt
97026
+ [2021-10-27 08:43:29,641] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_114_optim_states.pt
97027
+ [2021-10-27 08:43:29,655] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_108_optim_states.pt
97028
+ [2021-10-27 08:43:29,679] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_78_optim_states.pt
97029
+ [2021-10-27 08:43:29,686] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_58_optim_states.pt
97030
+ [2021-10-27 08:43:29,714] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_48_optim_states.pt
97031
+ [2021-10-27 08:43:29,718] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_11_optim_states.pt
97032
+ [2021-10-27 08:43:29,719] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_121_optim_states.pt
97033
+ [2021-10-27 08:43:29,731] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_79_optim_states.pt
97034
+ [2021-10-27 08:43:29,745] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_109_optim_states.pt
97035
+ [2021-10-27 08:43:29,802] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_63_optim_states.pt
97036
+ [2021-10-27 08:43:29,802] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_93_optim_states.pt
97037
+ [2021-10-27 08:43:29,845] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_86_optim_states.pt
97038
+ [2021-10-27 08:43:29,872] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_06_optim_states.pt
97039
+ [2021-10-27 08:43:29,881] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_83_optim_states.pt
97040
+ [2021-10-27 08:43:29,885] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_04_optim_states.pt
97041
+ [2021-10-27 08:43:29,900] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_110_optim_states.pt
97042
+ [2021-10-27 08:43:29,929] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_122_optim_states.pt
97043
+ [2021-10-27 08:43:29,931] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_75_optim_states.pt
97044
+ [2021-10-27 08:43:29,934] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_74_optim_states.pt
97045
+ [2021-10-27 08:43:29,944] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_91_optim_states.pt
97046
+ [2021-10-27 08:43:29,944] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_117_optim_states.pt
97047
+ [2021-10-27 08:43:29,969] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_37_optim_states.pt
97048
+ [2021-10-27 08:43:29,972] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_99_optim_states.pt
97049
+ [2021-10-27 08:43:30,003] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_68_optim_states.pt
97050
+ [2021-10-27 08:43:30,012] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_105_optim_states.pt
97051
+ [2021-10-27 08:43:30,018] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_67_optim_states.pt
97052
+ [2021-10-27 08:43:30,025] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_96_optim_states.pt
97053
+ [2021-10-27 08:43:30,027] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_26_optim_states.pt
97054
+ [2021-10-27 08:43:30,031] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_24_optim_states.pt
97055
+ [2021-10-27 08:43:30,032] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_94_optim_states.pt
97056
+ [2021-10-27 08:43:30,036] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_51_optim_states.pt
97057
+ [2021-10-27 08:43:30,039] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_89_optim_states.pt
97058
+ [2021-10-27 08:43:30,087] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_82_optim_states.pt
97059
+ [2021-10-27 08:43:30,108] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_103_optim_states.pt
97060
+ [2021-10-27 08:43:30,110] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_85_optim_states.pt
97061
+ [2021-10-27 08:43:30,138] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_104_optim_states.pt
97062
+ [2021-10-27 08:43:30,141] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_61_optim_states.pt
97063
+ [2021-10-27 08:43:30,158] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_36_optim_states.pt
97064
+ [2021-10-27 08:43:30,170] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_13_optim_states.pt
97065
+ [2021-10-27 08:43:30,171] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_09_optim_states.pt
97066
+ [2021-10-27 08:43:30,176] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_12_optim_states.pt
97067
+ [2021-10-27 08:43:30,196] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_30_optim_states.pt
97068
+ [2021-10-27 08:43:30,210] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_29_optim_states.pt
97069
+ [2021-10-27 08:43:30,237] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_64_optim_states.pt
97070
+ [2021-10-27 08:43:30,242] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_46_optim_states.pt
97071
+ [2021-10-27 08:43:30,269] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_32_optim_states.pt
97072
+ [2021-10-27 08:43:30,282] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_92_optim_states.pt
97073
+ [2021-10-27 08:43:30,286] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_52_optim_states.pt
97074
+ [2021-10-27 08:43:30,301] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_60_optim_states.pt
97075
+ [2021-10-27 08:43:30,339] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_40_optim_states.pt
97076
+ [2021-10-27 08:43:30,372] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_69_optim_states.pt
97077
+ [2021-10-27 08:43:30,401] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_47_optim_states.pt
97078
+ [2021-10-27 08:43:30,428] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_100_optim_states.pt
97079
+ [2021-10-27 08:43:30,475] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_113_optim_states.pt
97080
+ [2021-10-27 08:43:30,480] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_102_optim_states.pt
97081
+ [2021-10-27 08:43:30,510] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_33_optim_states.pt
97082
+ [2021-10-27 08:43:30,536] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_101_optim_states.pt
97083
+ [2021-10-27 08:43:30,561] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_57_optim_states.pt
97084
+ [2021-10-27 08:43:30,580] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_54_optim_states.pt
97085
+ [2021-10-27 08:43:30,583] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_65_optim_states.pt
97086
+ [2021-10-27 08:43:30,608] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_62_optim_states.pt
97087
+ [2021-10-27 08:43:30,676] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_66_optim_states.pt
97088
+ [2021-10-27 08:43:30,714] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_70_optim_states.pt
97089
+ [2021-10-27 08:43:30,716] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_43_optim_states.pt
97090
+ [2021-10-27 08:43:30,739] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_49_optim_states.pt
97091
+ [2021-10-27 08:43:30,745] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_41_optim_states.pt
97092
+ [2021-10-27 08:43:30,781] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_39_optim_states.pt
97093
+ [2021-10-27 08:43:30,815] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_55_optim_states.pt
97094
+ [2021-10-27 08:43:30,825] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_35_optim_states.pt
97095
+ [2021-10-27 08:43:30,868] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_56_optim_states.pt
97096
+ [2021-10-27 08:43:30,869] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_38_optim_states.pt
97097
+ [2021-10-27 08:43:30,875] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_42_optim_states.pt
97098
+ [2021-10-27 08:43:30,963] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_53_optim_states.pt
97099
+ [2021-10-27 08:43:30,964] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_50_optim_states.pt
97100
+ [2021-10-27 08:43:30,979] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_44_optim_states.pt
97101
+ [2021-10-27 08:43:30,991] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_45_optim_states.pt
97102
+ [2021-10-27 08:43:31,100] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_34_optim_states.pt
97103
+ [2021-10-27 08:43:31,240] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_126_optim_states.pt
97104
+ [2021-10-27 08:43:31,250] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_00_optim_states.pt
97105
+ [2021-10-27 08:43:31,381] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_03_optim_states.pt
97106
+ [2021-10-27 08:43:31,416] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_127_optim_states.pt
97107
+ [2021-10-27 08:43:32,638] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_124_optim_states.pt
97108
+ [2021-10-27 08:43:32,703] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_02_optim_states.pt
97109
+ [2021-10-27 08:43:32,817] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_125_optim_states.pt
97110
+ [2021-10-27 08:43:33,045] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_18_optim_states.pt
97111
+ [2021-10-27 08:43:33,181] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_01_optim_states.pt
97112
+ [2021-10-27 08:43:34,307] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_21_optim_states.pt
97113
+ [2021-10-27 08:43:34,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_20_optim_states.pt
97114
+ [2021-10-27 08:43:36,665] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_31_optim_states.pt
97115
+ [2021-10-27 08:43:37,955] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_16_optim_states.pt
97116
+ [2021-10-27 08:43:37,963] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_28_optim_states.pt
97117
+ [2021-10-27 08:43:38,123] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_19_optim_states.pt
97118
+ [2021-10-27 08:43:38,316] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2028/zero_pp_rank_0_mp_rank_17_optim_states.pt
97119
+ successfully saved checkpoint at iteration 2028 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints
97120
+ time (ms) | save-checkpoint: 25851.77
97121
+ [exiting program after 1190.4112010161082 minutes] datetime: 2021-10-27 08:43:38