bigscience-bot commited on
Commit
cb35e14
1 Parent(s): 2dba181
Files changed (1) hide show
  1. logs/main_log.txt +200 -0
logs/main_log.txt CHANGED
@@ -77002,3 +77002,203 @@ time (ms)
77002
  time (ms)
77003
  iteration 582/ 292968 | consumed samples: 1191936 | consumed tokens: 90669056 | elapsed time per iteration (ms): 108157.8 | learning rate: 3.178E-05 | global batch size: 2048 | lm loss: 5.475502E+00 | loss scale: 8192.0 | grad norm: 10832.692 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77004
  time (ms)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77002
  time (ms)
77003
  iteration 582/ 292968 | consumed samples: 1191936 | consumed tokens: 90669056 | elapsed time per iteration (ms): 108157.8 | learning rate: 3.178E-05 | global batch size: 2048 | lm loss: 5.475502E+00 | loss scale: 8192.0 | grad norm: 10832.692 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77004
  time (ms)
77005
+ iteration 583/ 292968 | consumed samples: 1193984 | consumed tokens: 90865664 | elapsed time per iteration (ms): 108967.2 | learning rate: 3.184E-05 | global batch size: 2048 | lm loss: 5.494294E+00 | loss scale: 8192.0 | grad norm: 14744.932 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77006
+ time (ms)
77007
+ iteration 584/ 292968 | consumed samples: 1196032 | consumed tokens: 91062272 | elapsed time per iteration (ms): 106812.8 | learning rate: 3.189E-05 | global batch size: 2048 | lm loss: 5.487658E+00 | loss scale: 8192.0 | grad norm: 8967.567 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77008
+ time (ms)
77009
+ iteration 585/ 292968 | consumed samples: 1198080 | consumed tokens: 91258880 | elapsed time per iteration (ms): 110130.1 | learning rate: 3.195E-05 | global batch size: 2048 | lm loss: 5.488459E+00 | loss scale: 8192.0 | grad norm: 14768.019 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77010
+ time (ms)
77011
+ iteration 586/ 292968 | consumed samples: 1200128 | consumed tokens: 91455488 | elapsed time per iteration (ms): 106231.0 | learning rate: 3.200E-05 | global batch size: 2048 | lm loss: 5.488029E+00 | loss scale: 8192.0 | grad norm: 13756.417 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77012
+ time (ms)
77013
+ iteration 587/ 292968 | consumed samples: 1202176 | consumed tokens: 91652096 | elapsed time per iteration (ms): 106565.7 | learning rate: 3.206E-05 | global batch size: 2048 | lm loss: 5.448896E+00 | loss scale: 8192.0 | grad norm: 8670.093 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77014
+ time (ms)
77015
+ iteration 588/ 292968 | consumed samples: 1204224 | consumed tokens: 91848704 | elapsed time per iteration (ms): 106823.5 | learning rate: 3.211E-05 | global batch size: 2048 | lm loss: 5.481108E+00 | loss scale: 8192.0 | grad norm: 13747.563 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77016
+ time (ms)
77017
+ iteration 589/ 292968 | consumed samples: 1206272 | consumed tokens: 92045312 | elapsed time per iteration (ms): 109210.1 | learning rate: 3.217E-05 | global batch size: 2048 | lm loss: 5.483897E+00 | loss scale: 8192.0 | grad norm: 13030.572 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77018
+ time (ms)
77019
+ iteration 590/ 292968 | consumed samples: 1208320 | consumed tokens: 92241920 | elapsed time per iteration (ms): 107071.2 | learning rate: 3.222E-05 | global batch size: 2048 | lm loss: 5.499794E+00 | loss scale: 8192.0 | grad norm: 12956.695 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77020
+ time (ms)
77021
+ iteration 591/ 292968 | consumed samples: 1210368 | consumed tokens: 92438528 | elapsed time per iteration (ms): 107481.3 | learning rate: 3.228E-05 | global batch size: 2048 | lm loss: 5.458858E+00 | loss scale: 8192.0 | grad norm: 8716.189 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77022
+ time (ms)
77023
+ iteration 592/ 292968 | consumed samples: 1212416 | consumed tokens: 92635136 | elapsed time per iteration (ms): 108187.6 | learning rate: 3.233E-05 | global batch size: 2048 | lm loss: 5.468006E+00 | loss scale: 8192.0 | grad norm: 10982.591 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77024
+ time (ms)
77025
+ iteration 593/ 292968 | consumed samples: 1214464 | consumed tokens: 92831744 | elapsed time per iteration (ms): 107146.7 | learning rate: 3.239E-05 | global batch size: 2048 | lm loss: 5.428665E+00 | loss scale: 8192.0 | grad norm: 10539.232 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77026
+ time (ms)
77027
+ iteration 594/ 292968 | consumed samples: 1216512 | consumed tokens: 93028352 | elapsed time per iteration (ms): 110124.1 | learning rate: 3.244E-05 | global batch size: 2048 | lm loss: 5.442387E+00 | loss scale: 8192.0 | grad norm: 13381.277 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77028
+ time (ms)
77029
+ iteration 595/ 292968 | consumed samples: 1218560 | consumed tokens: 93224960 | elapsed time per iteration (ms): 106387.0 | learning rate: 3.249E-05 | global batch size: 2048 | lm loss: 5.484375E+00 | loss scale: 8192.0 | grad norm: 11482.399 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77030
+ time (ms)
77031
+ iteration 596/ 292968 | consumed samples: 1220608 | consumed tokens: 93421568 | elapsed time per iteration (ms): 108330.7 | learning rate: 3.255E-05 | global batch size: 2048 | lm loss: 5.424896E+00 | loss scale: 8192.0 | grad norm: 12097.178 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77032
+ time (ms)
77033
+ iteration 597/ 292968 | consumed samples: 1222656 | consumed tokens: 93618176 | elapsed time per iteration (ms): 107065.9 | learning rate: 3.260E-05 | global batch size: 2048 | lm loss: 5.433896E+00 | loss scale: 8192.0 | grad norm: 15293.672 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77034
+ time (ms)
77035
+ iteration 598/ 292968 | consumed samples: 1224704 | consumed tokens: 93814784 | elapsed time per iteration (ms): 106989.0 | learning rate: 3.266E-05 | global batch size: 2048 | lm loss: 5.436405E+00 | loss scale: 8192.0 | grad norm: 11111.761 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77036
+ time (ms)
77037
+ iteration 599/ 292968 | consumed samples: 1226752 | consumed tokens: 94011392 | elapsed time per iteration (ms): 106858.4 | learning rate: 3.271E-05 | global batch size: 2048 | lm loss: 5.414397E+00 | loss scale: 8192.0 | grad norm: 13962.838 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77038
+ time (ms)
77039
+ iteration 600/ 292968 | consumed samples: 1228800 | consumed tokens: 94208000 | elapsed time per iteration (ms): 107260.3 | learning rate: 3.277E-05 | global batch size: 2048 | lm loss: 5.419570E+00 | loss scale: 8192.0 | grad norm: 11387.759 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77040
+ time (ms)
77041
+ -----------------------------------------------------------------------------------------------
77042
+ validation loss at iteration 600 | lm loss value: 5.387414E+00 | lm loss PPL: 2.186374E+02 |
77043
+ -----------------------------------------------------------------------------------------------
77044
+ saving checkpoint at iteration 600 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints
77045
+ [2021-10-25 06:13:42,645] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/mp_rank_01_model_states.pt
77046
+ [2021-10-25 06:13:43,582] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/mp_rank_00_model_states.pt
77047
+ [2021-10-25 06:13:56,312] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_17_optim_states.pt
77048
+ [2021-10-25 06:13:56,315] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_102_optim_states.pt
77049
+ [2021-10-25 06:13:56,357] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_16_optim_states.pt
77050
+ [2021-10-25 06:13:56,388] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_28_optim_states.pt
77051
+ [2021-10-25 06:13:56,529] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_21_optim_states.pt
77052
+ [2021-10-25 06:13:56,535] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_108_optim_states.pt
77053
+ [2021-10-25 06:13:56,537] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_93_optim_states.pt
77054
+ [2021-10-25 06:13:56,588] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_12_optim_states.pt
77055
+ [2021-10-25 06:13:56,617] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_116_optim_states.pt
77056
+ [2021-10-25 06:13:56,621] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_95_optim_states.pt
77057
+ [2021-10-25 06:13:56,637] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_122_optim_states.pt
77058
+ [2021-10-25 06:13:56,677] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_06_optim_states.pt
77059
+ [2021-10-25 06:13:56,683] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_62_optim_states.pt
77060
+ [2021-10-25 06:13:56,689] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_123_optim_states.pt
77061
+ [2021-10-25 06:13:56,697] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_67_optim_states.pt
77062
+ [2021-10-25 06:13:56,728] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_59_optim_states.pt
77063
+ [2021-10-25 06:13:56,728] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_07_optim_states.pt
77064
+ [2021-10-25 06:13:56,732] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_99_optim_states.pt
77065
+ [2021-10-25 06:13:56,733] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_100_optim_states.pt
77066
+ [2021-10-25 06:13:56,792] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_30_optim_states.pt
77067
+ [2021-10-25 06:13:56,816] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_114_optim_states.pt
77068
+ [2021-10-25 06:13:56,829] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_111_optim_states.pt
77069
+ [2021-10-25 06:13:56,831] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_118_optim_states.pt
77070
+ [2021-10-25 06:13:56,978] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_57_optim_states.pt
77071
+ [2021-10-25 06:13:56,996] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_107_optim_states.pt
77072
+ [2021-10-25 06:13:57,041] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_23_optim_states.pt
77073
+ [2021-10-25 06:13:57,081] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_65_optim_states.pt
77074
+ [2021-10-25 06:13:57,115] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_105_optim_states.pt
77075
+ [2021-10-25 06:13:57,128] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_15_optim_states.pt
77076
+ [2021-10-25 06:13:57,137] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_60_optim_states.pt
77077
+ [2021-10-25 06:13:57,180] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_113_optim_states.pt
77078
+ [2021-10-25 06:13:57,254] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_52_optim_states.pt
77079
+ [2021-10-25 06:13:57,303] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_14_optim_states.pt
77080
+ [2021-10-25 06:13:57,320] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_71_optim_states.pt
77081
+ [2021-10-25 06:13:57,392] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_83_optim_states.pt
77082
+ [2021-10-25 06:13:57,413] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_43_optim_states.pt
77083
+ [2021-10-25 06:13:57,508] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_04_optim_states.pt
77084
+ [2021-10-25 06:13:57,526] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_45_optim_states.pt
77085
+ [2021-10-25 06:13:57,563] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_101_optim_states.pt
77086
+ [2021-10-25 06:13:57,565] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_58_optim_states.pt
77087
+ [2021-10-25 06:13:57,579] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_09_optim_states.pt
77088
+ [2021-10-25 06:13:57,583] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_26_optim_states.pt
77089
+ [2021-10-25 06:13:57,600] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_18_optim_states.pt
77090
+ [2021-10-25 06:13:57,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_72_optim_states.pt
77091
+ [2021-10-25 06:13:57,657] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_103_optim_states.pt
77092
+ [2021-10-25 06:13:57,668] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_94_optim_states.pt
77093
+ [2021-10-25 06:13:57,669] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_22_optim_states.pt
77094
+ [2021-10-25 06:13:57,673] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_11_optim_states.pt
77095
+ [2021-10-25 06:13:57,679] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_19_optim_states.pt
77096
+ [2021-10-25 06:13:57,686] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_97_optim_states.pt
77097
+ [2021-10-25 06:13:57,694] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_119_optim_states.pt
77098
+ [2021-10-25 06:13:57,698] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_25_optim_states.pt
77099
+ [2021-10-25 06:13:57,714] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_78_optim_states.pt
77100
+ [2021-10-25 06:13:57,730] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_56_optim_states.pt
77101
+ [2021-10-25 06:13:57,736] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_50_optim_states.pt
77102
+ [2021-10-25 06:13:57,758] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_104_optim_states.pt
77103
+ [2021-10-25 06:13:57,763] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_92_optim_states.pt
77104
+ [2021-10-25 06:13:57,774] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_13_optim_states.pt
77105
+ [2021-10-25 06:13:57,780] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_61_optim_states.pt
77106
+ [2021-10-25 06:13:57,813] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_98_optim_states.pt
77107
+ [2021-10-25 06:13:57,822] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_24_optim_states.pt
77108
+ [2021-10-25 06:13:57,837] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_90_optim_states.pt
77109
+ [2021-10-25 06:13:57,838] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_36_optim_states.pt
77110
+ [2021-10-25 06:13:57,841] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_110_optim_states.pt
77111
+ [2021-10-25 06:13:57,852] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_117_optim_states.pt
77112
+ [2021-10-25 06:13:57,861] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_05_optim_states.pt
77113
+ [2021-10-25 06:13:57,872] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_109_optim_states.pt
77114
+ [2021-10-25 06:13:57,876] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_121_optim_states.pt
77115
+ [2021-10-25 06:13:57,885] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_112_optim_states.pt
77116
+ [2021-10-25 06:13:57,886] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_51_optim_states.pt
77117
+ [2021-10-25 06:13:57,889] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_96_optim_states.pt
77118
+ [2021-10-25 06:13:57,924] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_33_optim_states.pt
77119
+ [2021-10-25 06:13:57,927] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_74_optim_states.pt
77120
+ [2021-10-25 06:13:57,943] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_63_optim_states.pt
77121
+ [2021-10-25 06:13:57,948] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_68_optim_states.pt
77122
+ [2021-10-25 06:13:57,948] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_106_optim_states.pt
77123
+ [2021-10-25 06:13:57,950] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_120_optim_states.pt
77124
+ [2021-10-25 06:13:57,969] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_54_optim_states.pt
77125
+ [2021-10-25 06:13:57,975] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_10_optim_states.pt
77126
+ [2021-10-25 06:13:57,992] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_20_optim_states.pt
77127
+ [2021-10-25 06:13:58,013] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_35_optim_states.pt
77128
+ [2021-10-25 06:13:58,052] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_81_optim_states.pt
77129
+ [2021-10-25 06:13:58,074] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_115_optim_states.pt
77130
+ [2021-10-25 06:13:58,092] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_27_optim_states.pt
77131
+ [2021-10-25 06:13:58,098] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_86_optim_states.pt
77132
+ [2021-10-25 06:13:58,124] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_46_optim_states.pt
77133
+ [2021-10-25 06:13:58,139] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_87_optim_states.pt
77134
+ [2021-10-25 06:13:58,153] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_41_optim_states.pt
77135
+ [2021-10-25 06:13:58,168] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_55_optim_states.pt
77136
+ [2021-10-25 06:13:58,258] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_77_optim_states.pt
77137
+ [2021-10-25 06:13:58,275] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_40_optim_states.pt
77138
+ [2021-10-25 06:13:58,282] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_39_optim_states.pt
77139
+ [2021-10-25 06:13:58,327] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_88_optim_states.pt
77140
+ [2021-10-25 06:13:58,364] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_79_optim_states.pt
77141
+ [2021-10-25 06:13:58,375] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_69_optim_states.pt
77142
+ [2021-10-25 06:13:58,404] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_42_optim_states.pt
77143
+ [2021-10-25 06:13:58,420] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_31_optim_states.pt
77144
+ [2021-10-25 06:13:58,425] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_84_optim_states.pt
77145
+ [2021-10-25 06:13:58,453] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_89_optim_states.pt
77146
+ [2021-10-25 06:13:58,456] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_47_optim_states.pt
77147
+ [2021-10-25 06:13:58,476] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_38_optim_states.pt
77148
+ [2021-10-25 06:13:58,505] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_80_optim_states.pt
77149
+ [2021-10-25 06:13:58,532] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_82_optim_states.pt
77150
+ [2021-10-25 06:13:58,578] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_70_optim_states.pt
77151
+ [2021-10-25 06:13:58,585] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_53_optim_states.pt
77152
+ [2021-10-25 06:13:58,601] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_34_optim_states.pt
77153
+ [2021-10-25 06:13:58,640] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_44_optim_states.pt
77154
+ [2021-10-25 06:13:58,679] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_91_optim_states.pt
77155
+ [2021-10-25 06:13:58,698] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_76_optim_states.pt
77156
+ [2021-10-25 06:13:58,757] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_49_optim_states.pt
77157
+ [2021-10-25 06:13:58,810] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_32_optim_states.pt
77158
+ [2021-10-25 06:13:58,824] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_48_optim_states.pt
77159
+ [2021-10-25 06:13:58,831] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_37_optim_states.pt
77160
+ [2021-10-25 06:13:58,832] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_85_optim_states.pt
77161
+ [2021-10-25 06:13:58,876] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_08_optim_states.pt
77162
+ [2021-10-25 06:13:59,037] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_125_optim_states.pt
77163
+ [2021-10-25 06:13:59,405] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_127_optim_states.pt
77164
+ [2021-10-25 06:13:59,502] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_29_optim_states.pt
77165
+ [2021-10-25 06:14:00,054] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_03_optim_states.pt
77166
+ [2021-10-25 06:14:00,433] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_02_optim_states.pt
77167
+ [2021-10-25 06:14:00,677] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_126_optim_states.pt
77168
+ [2021-10-25 06:14:00,974] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_124_optim_states.pt
77169
+ [2021-10-25 06:14:04,943] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_75_optim_states.pt
77170
+ [2021-10-25 06:14:05,563] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_64_optim_states.pt
77171
+ [2021-10-25 06:14:06,200] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_73_optim_states.pt
77172
+ [2021-10-25 06:14:06,857] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_66_optim_states.pt
77173
+ [2021-10-25 06:14:12,792] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_01_optim_states.pt
77174
+ [2021-10-25 06:14:13,515] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step600/zero_pp_rank_0_mp_rank_00_optim_states.pt
77175
+ successfully saved checkpoint at iteration 600 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints
77176
+ time (ms) | save-checkpoint: 34761.21
77177
+ iteration 601/ 292968 | consumed samples: 1230848 | consumed tokens: 94404608 | elapsed time per iteration (ms): 304940.5 | learning rate: 3.282E-05 | global batch size: 2048 | lm loss: 5.396969E+00 | loss scale: 8192.0 | grad norm: 12332.412 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77178
+ time (ms)
77179
+ iteration 602/ 292968 | consumed samples: 1232896 | consumed tokens: 94601216 | elapsed time per iteration (ms): 106807.5 | learning rate: 3.288E-05 | global batch size: 2048 | lm loss: 5.408408E+00 | loss scale: 8192.0 | grad norm: 11929.351 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77180
+ time (ms)
77181
+ iteration 603/ 292968 | consumed samples: 1234944 | consumed tokens: 94797824 | elapsed time per iteration (ms): 107857.1 | learning rate: 3.293E-05 | global batch size: 2048 | lm loss: 5.420089E+00 | loss scale: 8192.0 | grad norm: 11171.102 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77182
+ time (ms)
77183
+ iteration 604/ 292968 | consumed samples: 1236992 | consumed tokens: 94994432 | elapsed time per iteration (ms): 107461.0 | learning rate: 3.299E-05 | global batch size: 2048 | lm loss: 5.418396E+00 | loss scale: 8192.0 | grad norm: 9342.805 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77184
+ time (ms)
77185
+ iteration 605/ 292968 | consumed samples: 1239040 | consumed tokens: 95191040 | elapsed time per iteration (ms): 107939.7 | learning rate: 3.304E-05 | global batch size: 2048 | lm loss: 5.415629E+00 | loss scale: 8192.0 | grad norm: 12331.412 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77186
+ time (ms)
77187
+ iteration 606/ 292968 | consumed samples: 1241088 | consumed tokens: 95387648 | elapsed time per iteration (ms): 106693.6 | learning rate: 3.310E-05 | global batch size: 2048 | lm loss: 5.435667E+00 | loss scale: 8192.0 | grad norm: 16086.731 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77188
+ time (ms)
77189
+ iteration 607/ 292968 | consumed samples: 1243136 | consumed tokens: 95584256 | elapsed time per iteration (ms): 107708.8 | learning rate: 3.315E-05 | global batch size: 2048 | lm loss: 5.409382E+00 | loss scale: 8192.0 | grad norm: 9374.954 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77190
+ time (ms)
77191
+ iteration 608/ 292968 | consumed samples: 1245184 | consumed tokens: 95780864 | elapsed time per iteration (ms): 107679.7 | learning rate: 3.320E-05 | global batch size: 2048 | lm loss: 5.423688E+00 | loss scale: 8192.0 | grad norm: 12232.800 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77192
+ time (ms)
77193
+ iteration 609/ 292968 | consumed samples: 1247232 | consumed tokens: 95977472 | elapsed time per iteration (ms): 108222.9 | learning rate: 3.326E-05 | global batch size: 2048 | lm loss: 5.402236E+00 | loss scale: 8192.0 | grad norm: 9228.233 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77194
+ time (ms)
77195
+ iteration 610/ 292968 | consumed samples: 1249280 | consumed tokens: 96174080 | elapsed time per iteration (ms): 107400.0 | learning rate: 3.331E-05 | global batch size: 2048 | lm loss: 5.412461E+00 | loss scale: 8192.0 | grad norm: 11245.757 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77196
+ time (ms)
77197
+ iteration 611/ 292968 | consumed samples: 1251328 | consumed tokens: 96370688 | elapsed time per iteration (ms): 106468.7 | learning rate: 3.337E-05 | global batch size: 2048 | lm loss: 5.408649E+00 | loss scale: 8192.0 | grad norm: 11344.448 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77198
+ time (ms)
77199
+ iteration 612/ 292968 | consumed samples: 1253376 | consumed tokens: 96567296 | elapsed time per iteration (ms): 107650.3 | learning rate: 3.342E-05 | global batch size: 2048 | lm loss: 5.407639E+00 | loss scale: 8192.0 | grad norm: 11098.585 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77200
+ time (ms)
77201
+ iteration 613/ 292968 | consumed samples: 1255424 | consumed tokens: 96763904 | elapsed time per iteration (ms): 107751.1 | learning rate: 3.348E-05 | global batch size: 2048 | lm loss: 5.380627E+00 | loss scale: 8192.0 | grad norm: 8762.937 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77202
+ time (ms)
77203
+ iteration 614/ 292968 | consumed samples: 1257472 | consumed tokens: 96960512 | elapsed time per iteration (ms): 110635.4 | learning rate: 3.353E-05 | global batch size: 2048 | lm loss: 5.375699E+00 | loss scale: 8192.0 | grad norm: 11229.270 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
77204
+ time (ms)