bigscience-bot commited on
Commit
6511b94
1 Parent(s): ed982cb
Files changed (1) hide show
  1. logs/main_log.txt +63 -0
logs/main_log.txt CHANGED
@@ -116300,3 +116300,66 @@ time (ms)
116300
  time (ms)
116301
  iteration 3140/ 292968 | consumed samples: 6430720 | consumed tokens: 942702592 | elapsed time per iteration (ms): 109945.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.550672E+00 | loss scale: 131072.0 | grad norm: 55870.403 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116302
  time (ms)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
116300
  time (ms)
116301
  iteration 3140/ 292968 | consumed samples: 6430720 | consumed tokens: 942702592 | elapsed time per iteration (ms): 109945.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.550672E+00 | loss scale: 131072.0 | grad norm: 55870.403 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116302
  time (ms)
116303
+ iteration 3141/ 292968 | consumed samples: 6432768 | consumed tokens: 943177728 | elapsed time per iteration (ms): 111833.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.535420E+00 | loss scale: 131072.0 | grad norm: 54687.584 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116304
+ time (ms)
116305
+ iteration 3142/ 292968 | consumed samples: 6434816 | consumed tokens: 943652864 | elapsed time per iteration (ms): 109935.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.554422E+00 | loss scale: 131072.0 | grad norm: 46354.847 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116306
+ time (ms)
116307
+ iteration 3143/ 292968 | consumed samples: 6436864 | consumed tokens: 944128000 | elapsed time per iteration (ms): 110450.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.515105E+00 | loss scale: 131072.0 | grad norm: 42457.256 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116308
+ time (ms)
116309
+ iteration 3144/ 292968 | consumed samples: 6438912 | consumed tokens: 944603136 | elapsed time per iteration (ms): 110392.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.491606E+00 | loss scale: 131072.0 | grad norm: 47675.537 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116310
+ time (ms)
116311
+ iteration 3145/ 292968 | consumed samples: 6440960 | consumed tokens: 945078272 | elapsed time per iteration (ms): 110165.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.545086E+00 | loss scale: 131072.0 | grad norm: 40437.099 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116312
+ time (ms)
116313
+ iteration 3146/ 292968 | consumed samples: 6443008 | consumed tokens: 945553408 | elapsed time per iteration (ms): 109112.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.519203E+00 | loss scale: 131072.0 | grad norm: 40121.803 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116314
+ time (ms)
116315
+ iteration 3147/ 292968 | consumed samples: 6445056 | consumed tokens: 946028544 | elapsed time per iteration (ms): 109992.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.507916E+00 | loss scale: 131072.0 | grad norm: 39602.549 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116316
+ time (ms)
116317
+ iteration 3148/ 292968 | consumed samples: 6447104 | consumed tokens: 946503680 | elapsed time per iteration (ms): 110837.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.501790E+00 | loss scale: 131072.0 | grad norm: 37185.032 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116318
+ time (ms)
116319
+ iteration 3149/ 292968 | consumed samples: 6449152 | consumed tokens: 946978816 | elapsed time per iteration (ms): 109989.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.528901E+00 | loss scale: 131072.0 | grad norm: 44056.823 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116320
+ time (ms)
116321
+ iteration 3150/ 292968 | consumed samples: 6451200 | consumed tokens: 947453952 | elapsed time per iteration (ms): 110689.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.500736E+00 | loss scale: 131072.0 | grad norm: 34733.114 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116322
+ time (ms)
116323
+ ------------------------------------------------------------------------------------------------
116324
+ validation loss at iteration 3150 | lm loss value: 3.517273E+00 | lm loss PPL: 3.369244E+01 |
116325
+ ------------------------------------------------------------------------------------------------
116326
+ iteration 3151/ 292968 | consumed samples: 6453248 | consumed tokens: 947929088 | elapsed time per iteration (ms): 289717.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.513340E+00 | loss scale: 131072.0 | grad norm: 35613.642 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116327
+ time (ms)
116328
+ iteration 3152/ 292968 | consumed samples: 6455296 | consumed tokens: 948404224 | elapsed time per iteration (ms): 109415.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.519228E+00 | loss scale: 131072.0 | grad norm: 46331.769 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116329
+ time (ms)
116330
+ iteration 3153/ 292968 | consumed samples: 6457344 | consumed tokens: 948879360 | elapsed time per iteration (ms): 108618.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.528655E+00 | loss scale: 131072.0 | grad norm: 62191.264 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116331
+ time (ms)
116332
+ iteration 3154/ 292968 | consumed samples: 6459392 | consumed tokens: 949354496 | elapsed time per iteration (ms): 109050.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.531178E+00 | loss scale: 131072.0 | grad norm: 55588.878 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116333
+ time (ms)
116334
+ iteration 3155/ 292968 | consumed samples: 6461440 | consumed tokens: 949829632 | elapsed time per iteration (ms): 111657.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.522779E+00 | loss scale: 131072.0 | grad norm: 44837.393 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116335
+ time (ms)
116336
+ iteration 3156/ 292968 | consumed samples: 6463488 | consumed tokens: 950304768 | elapsed time per iteration (ms): 110189.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.523057E+00 | loss scale: 131072.0 | grad norm: 43731.420 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116337
+ time (ms)
116338
+ iteration 3157/ 292968 | consumed samples: 6465536 | consumed tokens: 950779904 | elapsed time per iteration (ms): 110493.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.496690E+00 | loss scale: 131072.0 | grad norm: 46192.470 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116339
+ time (ms)
116340
+ iteration 3158/ 292968 | consumed samples: 6467584 | consumed tokens: 951255040 | elapsed time per iteration (ms): 109909.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.517199E+00 | loss scale: 131072.0 | grad norm: 31717.912 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116341
+ time (ms)
116342
+ iteration 3159/ 292968 | consumed samples: 6469632 | consumed tokens: 951730176 | elapsed time per iteration (ms): 110040.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.518413E+00 | loss scale: 131072.0 | grad norm: 40340.483 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116343
+ time (ms)
116344
+ iteration 3160/ 292968 | consumed samples: 6471680 | consumed tokens: 952205312 | elapsed time per iteration (ms): 111087.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.519091E+00 | loss scale: 131072.0 | grad norm: 32898.784 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116345
+ time (ms)
116346
+ iteration 3161/ 292968 | consumed samples: 6473728 | consumed tokens: 952680448 | elapsed time per iteration (ms): 109338.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.527358E+00 | loss scale: 131072.0 | grad norm: 34774.966 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116347
+ time (ms)
116348
+ iteration 3162/ 292968 | consumed samples: 6475776 | consumed tokens: 953155584 | elapsed time per iteration (ms): 108656.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.513849E+00 | loss scale: 131072.0 | grad norm: 39540.117 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116349
+ time (ms)
116350
+ iteration 3163/ 292968 | consumed samples: 6477824 | consumed tokens: 953630720 | elapsed time per iteration (ms): 109547.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.511124E+00 | loss scale: 131072.0 | grad norm: 48375.830 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116351
+ time (ms)
116352
+ iteration 3164/ 292968 | consumed samples: 6479872 | consumed tokens: 954105856 | elapsed time per iteration (ms): 113586.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.508611E+00 | loss scale: 131072.0 | grad norm: 52037.682 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116353
+ time (ms)
116354
+ iteration 3165/ 292968 | consumed samples: 6481920 | consumed tokens: 954580992 | elapsed time per iteration (ms): 114860.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.541578E+00 | loss scale: 131072.0 | grad norm: 41480.973 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116355
+ time (ms)
116356
+ iteration 3166/ 292968 | consumed samples: 6483968 | consumed tokens: 955056128 | elapsed time per iteration (ms): 121137.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.516208E+00 | loss scale: 131072.0 | grad norm: 41301.397 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116357
+ time (ms)
116358
+ iteration 3167/ 292968 | consumed samples: 6486016 | consumed tokens: 955531264 | elapsed time per iteration (ms): 110110.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.504046E+00 | loss scale: 131072.0 | grad norm: 47013.136 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116359
+ time (ms)
116360
+ iteration 3168/ 292968 | consumed samples: 6488064 | consumed tokens: 956006400 | elapsed time per iteration (ms): 110799.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.523125E+00 | loss scale: 131072.0 | grad norm: 53442.123 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116361
+ time (ms)
116362
+ iteration 3169/ 292968 | consumed samples: 6490112 | consumed tokens: 956481536 | elapsed time per iteration (ms): 109797.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.518640E+00 | loss scale: 131072.0 | grad norm: 44658.960 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116363
+ time (ms)
116364
+ iteration 3170/ 292968 | consumed samples: 6492160 | consumed tokens: 956956672 | elapsed time per iteration (ms): 109397.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.506108E+00 | loss scale: 131072.0 | grad norm: 37584.401 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116365
+ time (ms)