bigscience-bot commited on
Commit
8c85f0d
1 Parent(s): f8127e9
Files changed (1) hide show
  1. logs/main_log.txt +64 -0
logs/main_log.txt CHANGED
@@ -76874,3 +76874,67 @@ time (ms)
76874
  time (ms)
76875
  iteration 518/ 292968 | consumed samples: 1060864 | consumed tokens: 79101952 | elapsed time per iteration (ms): 111964.5 | learning rate: 2.829E-05 | global batch size: 2048 | lm loss: 5.635888E+00 | loss scale: 8192.0 | grad norm: 12731.825 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76876
  time (ms)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76874
  time (ms)
76875
  iteration 518/ 292968 | consumed samples: 1060864 | consumed tokens: 79101952 | elapsed time per iteration (ms): 111964.5 | learning rate: 2.829E-05 | global batch size: 2048 | lm loss: 5.635888E+00 | loss scale: 8192.0 | grad norm: 12731.825 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76876
  time (ms)
76877
+ iteration 519/ 292968 | consumed samples: 1062912 | consumed tokens: 79282176 | elapsed time per iteration (ms): 112055.2 | learning rate: 2.834E-05 | global batch size: 2048 | lm loss: 5.601050E+00 | loss scale: 8192.0 | grad norm: 11635.834 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76878
+ time (ms)
76879
+ iteration 520/ 292968 | consumed samples: 1064960 | consumed tokens: 79462400 | elapsed time per iteration (ms): 112418.5 | learning rate: 2.840E-05 | global batch size: 2048 | lm loss: 5.645939E+00 | loss scale: 8192.0 | grad norm: 17715.201 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76880
+ time (ms)
76881
+ iteration 521/ 292968 | consumed samples: 1067008 | consumed tokens: 79642624 | elapsed time per iteration (ms): 111558.5 | learning rate: 2.845E-05 | global batch size: 2048 | lm loss: 5.586247E+00 | loss scale: 8192.0 | grad norm: 9433.316 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76882
+ time (ms)
76883
+ iteration 522/ 292968 | consumed samples: 1069056 | consumed tokens: 79822848 | elapsed time per iteration (ms): 113098.8 | learning rate: 2.851E-05 | global batch size: 2048 | lm loss: 5.607241E+00 | loss scale: 8192.0 | grad norm: 11954.691 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76884
+ time (ms)
76885
+ iteration 523/ 292968 | consumed samples: 1071104 | consumed tokens: 80003072 | elapsed time per iteration (ms): 112106.8 | learning rate: 2.856E-05 | global batch size: 2048 | lm loss: 5.652853E+00 | loss scale: 8192.0 | grad norm: 16648.802 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76886
+ time (ms)
76887
+ iteration 524/ 292968 | consumed samples: 1073152 | consumed tokens: 80183296 | elapsed time per iteration (ms): 112809.7 | learning rate: 2.862E-05 | global batch size: 2048 | lm loss: 5.599886E+00 | loss scale: 8192.0 | grad norm: 9193.022 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76888
+ time (ms)
76889
+ iteration 525/ 292968 | consumed samples: 1075200 | consumed tokens: 80363520 | elapsed time per iteration (ms): 114026.4 | learning rate: 2.867E-05 | global batch size: 2048 | lm loss: 5.635831E+00 | loss scale: 8192.0 | grad norm: 22370.033 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76890
+ time (ms)
76891
+ iteration 526/ 292968 | consumed samples: 1077248 | consumed tokens: 80543744 | elapsed time per iteration (ms): 112873.4 | learning rate: 2.873E-05 | global batch size: 2048 | lm loss: 5.630721E+00 | loss scale: 8192.0 | grad norm: 11212.895 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76892
+ time (ms)
76893
+ iteration 527/ 292968 | consumed samples: 1079296 | consumed tokens: 80723968 | elapsed time per iteration (ms): 112562.2 | learning rate: 2.878E-05 | global batch size: 2048 | lm loss: 5.617833E+00 | loss scale: 8192.0 | grad norm: 16194.164 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76894
+ time (ms)
76895
+ iteration 528/ 292968 | consumed samples: 1081344 | consumed tokens: 80904192 | elapsed time per iteration (ms): 112871.5 | learning rate: 2.884E-05 | global batch size: 2048 | lm loss: 5.614437E+00 | loss scale: 8192.0 | grad norm: 13321.010 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76896
+ time (ms)
76897
+ iteration 529/ 292968 | consumed samples: 1083392 | consumed tokens: 81084416 | elapsed time per iteration (ms): 112230.3 | learning rate: 2.889E-05 | global batch size: 2048 | lm loss: 5.596371E+00 | loss scale: 8192.0 | grad norm: 9818.933 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76898
+ time (ms)
76899
+ iteration 530/ 292968 | consumed samples: 1085440 | consumed tokens: 81264640 | elapsed time per iteration (ms): 111781.8 | learning rate: 2.895E-05 | global batch size: 2048 | lm loss: 5.628756E+00 | loss scale: 8192.0 | grad norm: 15970.761 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76900
+ time (ms)
76901
+ iteration 531/ 292968 | consumed samples: 1087488 | consumed tokens: 81444864 | elapsed time per iteration (ms): 112070.8 | learning rate: 2.900E-05 | global batch size: 2048 | lm loss: 5.574606E+00 | loss scale: 8192.0 | grad norm: 12453.852 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76902
+ time (ms)
76903
+ iteration 532/ 292968 | consumed samples: 1089536 | consumed tokens: 81625088 | elapsed time per iteration (ms): 111479.9 | learning rate: 2.905E-05 | global batch size: 2048 | lm loss: 5.553162E+00 | loss scale: 8192.0 | grad norm: 12601.321 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76904
+ time (ms)
76905
+ iteration 533/ 292968 | consumed samples: 1091584 | consumed tokens: 81805312 | elapsed time per iteration (ms): 111390.0 | learning rate: 2.911E-05 | global batch size: 2048 | lm loss: 5.609733E+00 | loss scale: 8192.0 | grad norm: 13511.849 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76906
+ time (ms)
76907
+ iteration 534/ 292968 | consumed samples: 1093632 | consumed tokens: 81985536 | elapsed time per iteration (ms): 112213.7 | learning rate: 2.916E-05 | global batch size: 2048 | lm loss: 5.583689E+00 | loss scale: 8192.0 | grad norm: 11190.455 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76908
+ time (ms)
76909
+ iteration 535/ 292968 | consumed samples: 1095680 | consumed tokens: 82165760 | elapsed time per iteration (ms): 112993.1 | learning rate: 2.922E-05 | global batch size: 2048 | lm loss: 5.653582E+00 | loss scale: 8192.0 | grad norm: 20818.658 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76910
+ time (ms)
76911
+ iteration 536/ 292968 | consumed samples: 1097728 | consumed tokens: 82345984 | elapsed time per iteration (ms): 112307.7 | learning rate: 2.927E-05 | global batch size: 2048 | lm loss: 5.611212E+00 | loss scale: 8192.0 | grad norm: 10362.696 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76912
+ time (ms)
76913
+ iteration 537/ 292968 | consumed samples: 1099776 | consumed tokens: 82526208 | elapsed time per iteration (ms): 112970.5 | learning rate: 2.933E-05 | global batch size: 2048 | lm loss: 5.618240E+00 | loss scale: 8192.0 | grad norm: 14839.821 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76914
+ time (ms)
76915
+ iteration 538/ 292968 | consumed samples: 1101824 | consumed tokens: 82706432 | elapsed time per iteration (ms): 113120.5 | learning rate: 2.938E-05 | global batch size: 2048 | lm loss: 5.594517E+00 | loss scale: 8192.0 | grad norm: 13605.480 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76916
+ time (ms)
76917
+ iteration 539/ 292968 | consumed samples: 1103872 | consumed tokens: 82886656 | elapsed time per iteration (ms): 112476.6 | learning rate: 2.944E-05 | global batch size: 2048 | lm loss: 5.556248E+00 | loss scale: 8192.0 | grad norm: 13800.093 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76918
+ time (ms)
76919
+ iteration 540/ 292968 | consumed samples: 1105920 | consumed tokens: 83066880 | elapsed time per iteration (ms): 114182.8 | learning rate: 2.949E-05 | global batch size: 2048 | lm loss: 5.591393E+00 | loss scale: 8192.0 | grad norm: 10588.037 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76920
+ time (ms)
76921
+ iteration 541/ 292968 | consumed samples: 1107968 | consumed tokens: 83247104 | elapsed time per iteration (ms): 110876.2 | learning rate: 2.955E-05 | global batch size: 2048 | lm loss: 5.556509E+00 | loss scale: 8192.0 | grad norm: 13801.950 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76922
+ time (ms)
76923
+ iteration 542/ 292968 | consumed samples: 1110016 | consumed tokens: 83427328 | elapsed time per iteration (ms): 111658.7 | learning rate: 2.960E-05 | global batch size: 2048 | lm loss: 5.569237E+00 | loss scale: 8192.0 | grad norm: 14005.832 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76924
+ time (ms)
76925
+ iteration 543/ 292968 | consumed samples: 1112064 | consumed tokens: 83607552 | elapsed time per iteration (ms): 112214.1 | learning rate: 2.966E-05 | global batch size: 2048 | lm loss: 5.546272E+00 | loss scale: 8192.0 | grad norm: 11650.584 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76926
+ time (ms)
76927
+ iteration 544/ 292968 | consumed samples: 1114112 | consumed tokens: 83787776 | elapsed time per iteration (ms): 113179.9 | learning rate: 2.971E-05 | global batch size: 2048 | lm loss: 5.549253E+00 | loss scale: 8192.0 | grad norm: 13630.378 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76928
+ time (ms)
76929
+ iteration 545/ 292968 | consumed samples: 1116160 | consumed tokens: 83968000 | elapsed time per iteration (ms): 112602.2 | learning rate: 2.976E-05 | global batch size: 2048 | lm loss: 5.533734E+00 | loss scale: 8192.0 | grad norm: 10491.189 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76930
+ time (ms)
76931
+ iteration 546/ 292968 | consumed samples: 1118208 | consumed tokens: 84148224 | elapsed time per iteration (ms): 112024.9 | learning rate: 2.982E-05 | global batch size: 2048 | lm loss: 5.555665E+00 | loss scale: 8192.0 | grad norm: 14130.965 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76932
+ time (ms)
76933
+ iteration 547/ 292968 | consumed samples: 1120256 | consumed tokens: 84328448 | elapsed time per iteration (ms): 112655.1 | learning rate: 2.987E-05 | global batch size: 2048 | lm loss: 5.551611E+00 | loss scale: 8192.0 | grad norm: 12855.412 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76934
+ time (ms)
76935
+ iteration 548/ 292968 | consumed samples: 1122304 | consumed tokens: 84508672 | elapsed time per iteration (ms): 111576.2 | learning rate: 2.993E-05 | global batch size: 2048 | lm loss: 5.609882E+00 | loss scale: 8192.0 | grad norm: 15275.244 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76936
+ time (ms)
76937
+ iteration 549/ 292968 | consumed samples: 1124352 | consumed tokens: 84688896 | elapsed time per iteration (ms): 112577.9 | learning rate: 2.998E-05 | global batch size: 2048 | lm loss: 5.596916E+00 | loss scale: 8192.0 | grad norm: 13652.963 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76938
+ time (ms)
76939
+ iteration 550/ 292968 | consumed samples: 1126400 | consumed tokens: 84869120 | elapsed time per iteration (ms): 112519.9 | learning rate: 3.004E-05 | global batch size: 2048 | lm loss: 5.550436E+00 | loss scale: 8192.0 | grad norm: 10479.605 | num zeros: 0.0 | curriculum seqlen: 88 | number of skipped iterations: 0 | number of nan iterations: 0 |
76940
+ time (ms)