bigscience-bot commited on
Commit
fb89cf0
·
1 Parent(s): 6f04a8d
Files changed (1) hide show
  1. logs/main_log.txt +188 -0
logs/main_log.txt CHANGED
@@ -77202,3 +77202,191 @@ time (ms)
77202
  time (ms)
77203
  iteration 614/ 292968 | consumed samples: 1257472 | consumed tokens: 96960512 | elapsed time per iteration (ms): 110635.4 | learning rate: 3.353E-05 | global batch size: 2048 | lm loss: 5.375699E+00 | loss scale: 8192.0 | grad norm: 11229.270 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77204
  time (ms)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77202
  time (ms)
77203
  iteration 614/ 292968 | consumed samples: 1257472 | consumed tokens: 96960512 | elapsed time per iteration (ms): 110635.4 | learning rate: 3.353E-05 | global batch size: 2048 | lm loss: 5.375699E+00 | loss scale: 8192.0 | grad norm: 11229.270 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77204
  time (ms)
77205
+ iteration 615/ 292968 | consumed samples: 1259520 | consumed tokens: 97157120 | elapsed time per iteration (ms): 108098.9 | learning rate: 3.359E-05 | global batch size: 2048 | lm loss: 5.363403E+00 | loss scale: 8192.0 | grad norm: 10400.184 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77206
+ time (ms)
77207
+ iteration 616/ 292968 | consumed samples: 1261568 | consumed tokens: 97353728 | elapsed time per iteration (ms): 109329.1 | learning rate: 3.364E-05 | global batch size: 2048 | lm loss: 5.384151E+00 | loss scale: 8192.0 | grad norm: 12453.326 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77208
+ time (ms)
77209
+ iteration 617/ 292968 | consumed samples: 1263616 | consumed tokens: 97550336 | elapsed time per iteration (ms): 107222.2 | learning rate: 3.370E-05 | global batch size: 2048 | lm loss: 5.365817E+00 | loss scale: 8192.0 | grad norm: 12017.613 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77210
+ time (ms)
77211
+ iteration 618/ 292968 | consumed samples: 1265664 | consumed tokens: 97746944 | elapsed time per iteration (ms): 107139.4 | learning rate: 3.375E-05 | global batch size: 2048 | lm loss: 5.358659E+00 | loss scale: 8192.0 | grad norm: 9650.822 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77212
+ time (ms)
77213
+ iteration 619/ 292968 | consumed samples: 1267712 | consumed tokens: 97943552 | elapsed time per iteration (ms): 107963.7 | learning rate: 3.381E-05 | global batch size: 2048 | lm loss: 5.360062E+00 | loss scale: 8192.0 | grad norm: 9182.645 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77214
+ time (ms)
77215
+ iteration 620/ 292968 | consumed samples: 1269760 | consumed tokens: 98140160 | elapsed time per iteration (ms): 106941.4 | learning rate: 3.386E-05 | global batch size: 2048 | lm loss: 5.350104E+00 | loss scale: 8192.0 | grad norm: 10388.823 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77216
+ time (ms)
77217
+ iteration 621/ 292968 | consumed samples: 1271808 | consumed tokens: 98336768 | elapsed time per iteration (ms): 108728.6 | learning rate: 3.391E-05 | global batch size: 2048 | lm loss: 5.330681E+00 | loss scale: 8192.0 | grad norm: 10010.116 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77218
+ time (ms)
77219
+ iteration 622/ 292968 | consumed samples: 1273856 | consumed tokens: 98533376 | elapsed time per iteration (ms): 107843.5 | learning rate: 3.397E-05 | global batch size: 2048 | lm loss: 5.387991E+00 | loss scale: 8192.0 | grad norm: 11984.058 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77220
+ time (ms)
77221
+ iteration 623/ 292968 | consumed samples: 1275904 | consumed tokens: 98729984 | elapsed time per iteration (ms): 107380.4 | learning rate: 3.402E-05 | global batch size: 2048 | lm loss: 5.347582E+00 | loss scale: 8192.0 | grad norm: 9513.099 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77222
+ time (ms)
77223
+ iteration 624/ 292968 | consumed samples: 1277952 | consumed tokens: 98926592 | elapsed time per iteration (ms): 108875.1 | learning rate: 3.408E-05 | global batch size: 2048 | lm loss: 5.360654E+00 | loss scale: 8192.0 | grad norm: 11778.551 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77224
+ time (ms)
77225
+ iteration 625/ 292968 | consumed samples: 1280000 | consumed tokens: 99123200 | elapsed time per iteration (ms): 106579.6 | learning rate: 3.413E-05 | global batch size: 2048 | lm loss: 5.373547E+00 | loss scale: 8192.0 | grad norm: 10277.204 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77226
+ time (ms)
77227
+ iteration 626/ 292968 | consumed samples: 1282048 | consumed tokens: 99319808 | elapsed time per iteration (ms): 109385.4 | learning rate: 3.419E-05 | global batch size: 2048 | lm loss: 5.341951E+00 | loss scale: 8192.0 | grad norm: 10174.799 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77228
+ time (ms)
77229
+ iteration 627/ 292968 | consumed samples: 1284096 | consumed tokens: 99516416 | elapsed time per iteration (ms): 107213.8 | learning rate: 3.424E-05 | global batch size: 2048 | lm loss: 5.362940E+00 | loss scale: 8192.0 | grad norm: 10631.689 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77230
+ time (ms)
77231
+ iteration 628/ 292968 | consumed samples: 1286144 | consumed tokens: 99713024 | elapsed time per iteration (ms): 108581.1 | learning rate: 3.430E-05 | global batch size: 2048 | lm loss: 5.395461E+00 | loss scale: 8192.0 | grad norm: 12382.653 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77232
+ time (ms)
77233
+ iteration 629/ 292968 | consumed samples: 1288192 | consumed tokens: 99909632 | elapsed time per iteration (ms): 108292.6 | learning rate: 3.435E-05 | global batch size: 2048 | lm loss: 5.370893E+00 | loss scale: 8192.0 | grad norm: 9780.522 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77234
+ time (ms)
77235
+ iteration 630/ 292968 | consumed samples: 1290240 | consumed tokens: 100106240 | elapsed time per iteration (ms): 106744.8 | learning rate: 3.441E-05 | global batch size: 2048 | lm loss: 5.326004E+00 | loss scale: 8192.0 | grad norm: 12227.046 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77236
+ time (ms)
77237
+ iteration 631/ 292968 | consumed samples: 1292288 | consumed tokens: 100302848 | elapsed time per iteration (ms): 107582.1 | learning rate: 3.446E-05 | global batch size: 2048 | lm loss: 5.340735E+00 | loss scale: 8192.0 | grad norm: 11877.257 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77238
+ time (ms)
77239
+ iteration 632/ 292968 | consumed samples: 1294336 | consumed tokens: 100499456 | elapsed time per iteration (ms): 107181.5 | learning rate: 3.452E-05 | global batch size: 2048 | lm loss: 5.347682E+00 | loss scale: 8192.0 | grad norm: 12827.897 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77240
+ time (ms)
77241
+ iteration 633/ 292968 | consumed samples: 1296384 | consumed tokens: 100696064 | elapsed time per iteration (ms): 107386.1 | learning rate: 3.457E-05 | global batch size: 2048 | lm loss: 5.321402E+00 | loss scale: 8192.0 | grad norm: 10107.434 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77242
+ time (ms)
77243
+ iteration 634/ 292968 | consumed samples: 1298432 | consumed tokens: 100892672 | elapsed time per iteration (ms): 107175.9 | learning rate: 3.462E-05 | global batch size: 2048 | lm loss: 5.320929E+00 | loss scale: 8192.0 | grad norm: 8954.510 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77244
+ time (ms)
77245
+ iteration 635/ 292968 | consumed samples: 1300480 | consumed tokens: 101089280 | elapsed time per iteration (ms): 107956.8 | learning rate: 3.468E-05 | global batch size: 2048 | lm loss: 5.306052E+00 | loss scale: 8192.0 | grad norm: 11726.553 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77246
+ time (ms)
77247
+ iteration 636/ 292968 | consumed samples: 1302528 | consumed tokens: 101285888 | elapsed time per iteration (ms): 107124.1 | learning rate: 3.473E-05 | global batch size: 2048 | lm loss: 5.340025E+00 | loss scale: 8192.0 | grad norm: 9664.223 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77248
+ time (ms)
77249
+ iteration 637/ 292968 | consumed samples: 1304576 | consumed tokens: 101482496 | elapsed time per iteration (ms): 107183.5 | learning rate: 3.479E-05 | global batch size: 2048 | lm loss: 5.298586E+00 | loss scale: 8192.0 | grad norm: 11783.685 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77250
+ time (ms)
77251
+ iteration 638/ 292968 | consumed samples: 1306624 | consumed tokens: 101679104 | elapsed time per iteration (ms): 107166.1 | learning rate: 3.484E-05 | global batch size: 2048 | lm loss: 5.315363E+00 | loss scale: 8192.0 | grad norm: 10217.252 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77252
+ time (ms)
77253
+ iteration 639/ 292968 | consumed samples: 1308672 | consumed tokens: 101875712 | elapsed time per iteration (ms): 107360.8 | learning rate: 3.490E-05 | global batch size: 2048 | lm loss: 5.312271E+00 | loss scale: 8192.0 | grad norm: 10486.233 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77254
+ time (ms)
77255
+ iteration 640/ 292968 | consumed samples: 1310720 | consumed tokens: 102072320 | elapsed time per iteration (ms): 108937.9 | learning rate: 3.495E-05 | global batch size: 2048 | lm loss: 5.286817E+00 | loss scale: 8192.0 | grad norm: 9778.188 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77256
+ time (ms)
77257
+ iteration 641/ 292968 | consumed samples: 1312768 | consumed tokens: 102268928 | elapsed time per iteration (ms): 107300.5 | learning rate: 3.501E-05 | global batch size: 2048 | lm loss: 5.298764E+00 | loss scale: 8192.0 | grad norm: 9331.960 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77258
+ time (ms)
77259
+ saving checkpoint at iteration 641 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints
77260
+ [2021-10-25 07:27:49,703] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/mp_rank_00_model_states.pt
77261
+ [2021-10-25 07:27:49,848] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/mp_rank_01_model_states.pt
77262
+ [2021-10-25 07:28:02,627] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_107_optim_states.pt
77263
+ [2021-10-25 07:28:02,627] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_57_optim_states.pt
77264
+ [2021-10-25 07:28:02,665] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_102_optim_states.pt
77265
+ [2021-10-25 07:28:02,705] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_116_optim_states.pt
77266
+ [2021-10-25 07:28:02,708] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_63_optim_states.pt
77267
+ [2021-10-25 07:28:02,710] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_121_optim_states.pt
77268
+ [2021-10-25 07:28:02,769] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_28_optim_states.pt
77269
+ [2021-10-25 07:28:02,814] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_93_optim_states.pt
77270
+ [2021-10-25 07:28:02,823] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_114_optim_states.pt
77271
+ [2021-10-25 07:28:02,859] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_14_optim_states.pt
77272
+ [2021-10-25 07:28:02,860] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_101_optim_states.pt
77273
+ [2021-10-25 07:28:02,882] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_110_optim_states.pt
77274
+ [2021-10-25 07:28:02,898] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_62_optim_states.pt
77275
+ [2021-10-25 07:28:02,900] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_12_optim_states.pt
77276
+ [2021-10-25 07:28:02,908] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_105_optim_states.pt
77277
+ [2021-10-25 07:28:02,925] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_111_optim_states.pt
77278
+ [2021-10-25 07:28:02,941] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_118_optim_states.pt
77279
+ [2021-10-25 07:28:02,952] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_122_optim_states.pt
77280
+ [2021-10-25 07:28:02,952] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_30_optim_states.pt
77281
+ [2021-10-25 07:28:02,972] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_59_optim_states.pt
77282
+ [2021-10-25 07:28:02,996] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_20_optim_states.pt
77283
+ [2021-10-25 07:28:03,003] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_95_optim_states.pt
77284
+ [2021-10-25 07:28:03,004] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_18_optim_states.pt
77285
+ [2021-10-25 07:28:03,007] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_66_optim_states.pt
77286
+ [2021-10-25 07:28:03,035] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_99_optim_states.pt
77287
+ [2021-10-25 07:28:03,041] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_06_optim_states.pt
77288
+ [2021-10-25 07:28:03,100] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_19_optim_states.pt
77289
+ [2021-10-25 07:28:03,122] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_07_optim_states.pt
77290
+ [2021-10-25 07:28:03,171] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_21_optim_states.pt
77291
+ [2021-10-25 07:28:03,258] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_113_optim_states.pt
77292
+ [2021-10-25 07:28:03,436] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_71_optim_states.pt
77293
+ [2021-10-25 07:28:03,463] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_64_optim_states.pt
77294
+ [2021-10-25 07:28:03,467] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_48_optim_states.pt
77295
+ [2021-10-25 07:28:03,496] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_52_optim_states.pt
77296
+ [2021-10-25 07:28:03,781] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_50_optim_states.pt
77297
+ [2021-10-25 07:28:03,789] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_94_optim_states.pt
77298
+ [2021-10-25 07:28:03,798] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_60_optim_states.pt
77299
+ [2021-10-25 07:28:03,807] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_97_optim_states.pt
77300
+ [2021-10-25 07:28:03,834] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_09_optim_states.pt
77301
+ [2021-10-25 07:28:03,846] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_104_optim_states.pt
77302
+ [2021-10-25 07:28:03,851] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_123_optim_states.pt
77303
+ [2021-10-25 07:28:03,852] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_58_optim_states.pt
77304
+ [2021-10-25 07:28:03,854] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_15_optim_states.pt
77305
+ [2021-10-25 07:28:03,863] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_08_optim_states.pt
77306
+ [2021-10-25 07:28:03,868] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_72_optim_states.pt
77307
+ [2021-10-25 07:28:03,874] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_100_optim_states.pt
77308
+ [2021-10-25 07:28:03,886] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_25_optim_states.pt
77309
+ [2021-10-25 07:28:03,927] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_61_optim_states.pt
77310
+ [2021-10-25 07:28:03,930] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_119_optim_states.pt
77311
+ [2021-10-25 07:28:03,962] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_112_optim_states.pt
77312
+ [2021-10-25 07:28:03,969] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_32_optim_states.pt
77313
+ [2021-10-25 07:28:04,021] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_74_optim_states.pt
77314
+ [2021-10-25 07:28:04,025] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_92_optim_states.pt
77315
+ [2021-10-25 07:28:04,025] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_91_optim_states.pt
77316
+ [2021-10-25 07:28:04,036] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_36_optim_states.pt
77317
+ [2021-10-25 07:28:04,039] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_45_optim_states.pt
77318
+ [2021-10-25 07:28:04,042] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_98_optim_states.pt
77319
+ [2021-10-25 07:28:04,061] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_16_optim_states.pt
77320
+ [2021-10-25 07:28:04,065] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_103_optim_states.pt
77321
+ [2021-10-25 07:28:04,106] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_22_optim_states.pt
77322
+ [2021-10-25 07:28:04,106] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_23_optim_states.pt
77323
+ [2021-10-25 07:28:04,137] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_54_optim_states.pt
77324
+ [2021-10-25 07:28:04,154] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_43_optim_states.pt
77325
+ [2021-10-25 07:28:04,162] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_26_optim_states.pt
77326
+ [2021-10-25 07:28:04,163] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_106_optim_states.pt
77327
+ [2021-10-25 07:28:04,171] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_17_optim_states.pt
77328
+ [2021-10-25 07:28:04,171] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_96_optim_states.pt
77329
+ [2021-10-25 07:28:04,187] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_13_optim_states.pt
77330
+ [2021-10-25 07:28:04,200] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_04_optim_states.pt
77331
+ [2021-10-25 07:28:04,203] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_120_optim_states.pt
77332
+ [2021-10-25 07:28:04,204] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_56_optim_states.pt
77333
+ [2021-10-25 07:28:04,209] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_117_optim_states.pt
77334
+ [2021-10-25 07:28:04,220] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_05_optim_states.pt
77335
+ [2021-10-25 07:28:04,223] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_86_optim_states.pt
77336
+ [2021-10-25 07:28:04,228] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_24_optim_states.pt
77337
+ [2021-10-25 07:28:04,252] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_115_optim_states.pt
77338
+ [2021-10-25 07:28:04,280] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_109_optim_states.pt
77339
+ [2021-10-25 07:28:04,315] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_77_optim_states.pt
77340
+ [2021-10-25 07:28:04,326] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_27_optim_states.pt
77341
+ [2021-10-25 07:28:04,327] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_39_optim_states.pt
77342
+ [2021-10-25 07:28:04,337] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_76_optim_states.pt
77343
+ [2021-10-25 07:28:04,372] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_31_optim_states.pt
77344
+ [2021-10-25 07:28:04,385] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_46_optim_states.pt
77345
+ [2021-10-25 07:28:04,393] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_35_optim_states.pt
77346
+ [2021-10-25 07:28:04,417] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_81_optim_states.pt
77347
+ [2021-10-25 07:28:04,437] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_87_optim_states.pt
77348
+ [2021-10-25 07:28:04,438] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_68_optim_states.pt
77349
+ [2021-10-25 07:28:04,451] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_40_optim_states.pt
77350
+ [2021-10-25 07:28:04,455] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_88_optim_states.pt
77351
+ [2021-10-25 07:28:04,456] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_108_optim_states.pt
77352
+ [2021-10-25 07:28:04,457] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_79_optim_states.pt
77353
+ [2021-10-25 07:28:04,500] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_11_optim_states.pt
77354
+ [2021-10-25 07:28:04,603] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_69_optim_states.pt
77355
+ [2021-10-25 07:28:04,616] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_80_optim_states.pt
77356
+ [2021-10-25 07:28:04,640] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_47_optim_states.pt
77357
+ [2021-10-25 07:28:04,701] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_85_optim_states.pt
77358
+ [2021-10-25 07:28:04,727] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_70_optim_states.pt
77359
+ [2021-10-25 07:28:04,729] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_55_optim_states.pt
77360
+ [2021-10-25 07:28:04,743] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_49_optim_states.pt
77361
+ [2021-10-25 07:28:04,766] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_83_optim_states.pt
77362
+ [2021-10-25 07:28:04,800] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_53_optim_states.pt
77363
+ [2021-10-25 07:28:04,806] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_10_optim_states.pt
77364
+ [2021-10-25 07:28:04,815] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_41_optim_states.pt
77365
+ [2021-10-25 07:28:04,833] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_90_optim_states.pt
77366
+ [2021-10-25 07:28:04,870] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_42_optim_states.pt
77367
+ [2021-10-25 07:28:04,875] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_33_optim_states.pt
77368
+ [2021-10-25 07:28:04,900] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_38_optim_states.pt
77369
+ [2021-10-25 07:28:04,913] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_89_optim_states.pt
77370
+ [2021-10-25 07:28:04,995] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_84_optim_states.pt
77371
+ [2021-10-25 07:28:05,009] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_82_optim_states.pt
77372
+ [2021-10-25 07:28:05,072] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_34_optim_states.pt
77373
+ [2021-10-25 07:28:05,075] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_44_optim_states.pt
77374
+ [2021-10-25 07:28:05,078] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_51_optim_states.pt
77375
+ [2021-10-25 07:28:05,165] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_78_optim_states.pt
77376
+ [2021-10-25 07:28:05,173] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_37_optim_states.pt
77377
+ [2021-10-25 07:28:05,416] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_125_optim_states.pt
77378
+ [2021-10-25 07:28:05,576] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_00_optim_states.pt
77379
+ [2021-10-25 07:28:05,661] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_02_optim_states.pt
77380
+ [2021-10-25 07:28:05,669] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_126_optim_states.pt
77381
+ [2021-10-25 07:28:05,749] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_29_optim_states.pt
77382
+ [2021-10-25 07:28:06,715] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_127_optim_states.pt
77383
+ [2021-10-25 07:28:07,156] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_124_optim_states.pt
77384
+ [2021-10-25 07:28:11,593] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_75_optim_states.pt
77385
+ [2021-10-25 07:28:11,789] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_65_optim_states.pt
77386
+ [2021-10-25 07:28:12,677] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_73_optim_states.pt
77387
+ [2021-10-25 07:28:13,056] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_67_optim_states.pt
77388
+ [2021-10-25 07:28:16,023] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_03_optim_states.pt
77389
+ [2021-10-25 07:28:16,027] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step641/zero_pp_rank_0_mp_rank_01_optim_states.pt
77390
+ successfully saved checkpoint at iteration 641 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints
77391
+ time (ms) | save-checkpoint: 29241.51
77392
+ [exiting program after 1191.7797291556994 minutes] datetime: 2021-10-25 07:28:16