bigscience-bot commited on
Commit
eec5f68
1 Parent(s): aefdfae
Files changed (1) hide show
  1. logs/main_log.txt +56 -0
logs/main_log.txt CHANGED
@@ -106467,3 +106467,59 @@ time (ms)
106467
  time (ms)
106468
  iteration 2488/ 292968 | consumed samples: 5095424 | consumed tokens: 655343616 | elapsed time per iteration (ms): 127594.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.699123E+00 | loss scale: 65536.0 | grad norm: 20848.341 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106469
  time (ms)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
106467
  time (ms)
106468
  iteration 2488/ 292968 | consumed samples: 5095424 | consumed tokens: 655343616 | elapsed time per iteration (ms): 127594.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.699123E+00 | loss scale: 65536.0 | grad norm: 20848.341 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106469
  time (ms)
106470
+ iteration 2489/ 292968 | consumed samples: 5097472 | consumed tokens: 655753216 | elapsed time per iteration (ms): 125631.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.727104E+00 | loss scale: 65536.0 | grad norm: 25658.727 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106471
+ time (ms)
106472
+ iteration 2490/ 292968 | consumed samples: 5099520 | consumed tokens: 656162816 | elapsed time per iteration (ms): 127808.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.711399E+00 | loss scale: 65536.0 | grad norm: 24253.847 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106473
+ time (ms)
106474
+ iteration 2491/ 292968 | consumed samples: 5101568 | consumed tokens: 656572416 | elapsed time per iteration (ms): 127414.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.729037E+00 | loss scale: 65536.0 | grad norm: 22636.586 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106475
+ time (ms)
106476
+ iteration 2492/ 292968 | consumed samples: 5103616 | consumed tokens: 656982016 | elapsed time per iteration (ms): 126033.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.737489E+00 | loss scale: 65536.0 | grad norm: 20919.173 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106477
+ time (ms)
106478
+ iteration 2493/ 292968 | consumed samples: 5105664 | consumed tokens: 657391616 | elapsed time per iteration (ms): 127754.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.705382E+00 | loss scale: 65536.0 | grad norm: 19622.537 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106479
+ time (ms)
106480
+ iteration 2494/ 292968 | consumed samples: 5107712 | consumed tokens: 657801216 | elapsed time per iteration (ms): 125307.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.728056E+00 | loss scale: 65536.0 | grad norm: 27467.263 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106481
+ time (ms)
106482
+ iteration 2495/ 292968 | consumed samples: 5109760 | consumed tokens: 658210816 | elapsed time per iteration (ms): 126630.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.719722E+00 | loss scale: 65536.0 | grad norm: 34198.566 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106483
+ time (ms)
106484
+ iteration 2496/ 292968 | consumed samples: 5111808 | consumed tokens: 658620416 | elapsed time per iteration (ms): 128610.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.718237E+00 | loss scale: 65536.0 | grad norm: 35073.791 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106485
+ time (ms)
106486
+ iteration 2497/ 292968 | consumed samples: 5113856 | consumed tokens: 659030016 | elapsed time per iteration (ms): 126763.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.691556E+00 | loss scale: 65536.0 | grad norm: 29139.525 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106487
+ time (ms)
106488
+ iteration 2498/ 292968 | consumed samples: 5115904 | consumed tokens: 659439616 | elapsed time per iteration (ms): 125641.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.724890E+00 | loss scale: 65536.0 | grad norm: 23439.934 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106489
+ time (ms)
106490
+ iteration 2499/ 292968 | consumed samples: 5117952 | consumed tokens: 659849216 | elapsed time per iteration (ms): 127715.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.716745E+00 | loss scale: 65536.0 | grad norm: 19943.741 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106491
+ time (ms)
106492
+ iteration 2500/ 292968 | consumed samples: 5120000 | consumed tokens: 660258816 | elapsed time per iteration (ms): 128601.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.712626E+00 | loss scale: 131072.0 | grad norm: 20295.690 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106493
+ time (ms)
106494
+ iteration 2501/ 292968 | consumed samples: 5122048 | consumed tokens: 660668416 | elapsed time per iteration (ms): 131174.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.713100E+00 | loss scale: 131072.0 | grad norm: 36931.195 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106495
+ time (ms)
106496
+ iteration 2502/ 292968 | consumed samples: 5124096 | consumed tokens: 661078016 | elapsed time per iteration (ms): 131175.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.694582E+00 | loss scale: 131072.0 | grad norm: 49927.205 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106497
+ time (ms)
106498
+ iteration 2503/ 292968 | consumed samples: 5126144 | consumed tokens: 661487616 | elapsed time per iteration (ms): 129343.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.724925E+00 | loss scale: 131072.0 | grad norm: 60177.454 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106499
+ time (ms)
106500
+ iteration 2504/ 292968 | consumed samples: 5128192 | consumed tokens: 661897216 | elapsed time per iteration (ms): 127507.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.701005E+00 | loss scale: 131072.0 | grad norm: 50856.707 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106501
+ time (ms)
106502
+ iteration 2505/ 292968 | consumed samples: 5130240 | consumed tokens: 662306816 | elapsed time per iteration (ms): 129902.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.690974E+00 | loss scale: 131072.0 | grad norm: 53157.344 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106503
+ time (ms)
106504
+ iteration 2506/ 292968 | consumed samples: 5132288 | consumed tokens: 662716416 | elapsed time per iteration (ms): 128518.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.698850E+00 | loss scale: 131072.0 | grad norm: 54977.648 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106505
+ time (ms)
106506
+ iteration 2507/ 292968 | consumed samples: 5134336 | consumed tokens: 663126016 | elapsed time per iteration (ms): 127751.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.727875E+00 | loss scale: 131072.0 | grad norm: 59344.173 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106507
+ time (ms)
106508
+ iteration 2508/ 292968 | consumed samples: 5136384 | consumed tokens: 663535616 | elapsed time per iteration (ms): 130017.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.720293E+00 | loss scale: 131072.0 | grad norm: 45567.528 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106509
+ time (ms)
106510
+ iteration 2509/ 292968 | consumed samples: 5138432 | consumed tokens: 663945216 | elapsed time per iteration (ms): 131387.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.712830E+00 | loss scale: 131072.0 | grad norm: 41242.503 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106511
+ time (ms)
106512
+ iteration 2510/ 292968 | consumed samples: 5140480 | consumed tokens: 664354816 | elapsed time per iteration (ms): 131183.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.711005E+00 | loss scale: 131072.0 | grad norm: 49437.526 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106513
+ time (ms)
106514
+ iteration 2511/ 292968 | consumed samples: 5142528 | consumed tokens: 664764416 | elapsed time per iteration (ms): 134500.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.709109E+00 | loss scale: 131072.0 | grad norm: 55609.251 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106515
+ time (ms)
106516
+ iteration 2512/ 292968 | consumed samples: 5144576 | consumed tokens: 665174016 | elapsed time per iteration (ms): 135374.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.734447E+00 | loss scale: 131072.0 | grad norm: 43249.036 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106517
+ time (ms)
106518
+ iteration 2513/ 292968 | consumed samples: 5146624 | consumed tokens: 665583616 | elapsed time per iteration (ms): 130829.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.699287E+00 | loss scale: 131072.0 | grad norm: 35654.330 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106519
+ time (ms)
106520
+ iteration 2514/ 292968 | consumed samples: 5148672 | consumed tokens: 665993216 | elapsed time per iteration (ms): 133991.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.718992E+00 | loss scale: 131072.0 | grad norm: 37759.592 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106521
+ time (ms)
106522
+ iteration 2515/ 292968 | consumed samples: 5150720 | consumed tokens: 666402816 | elapsed time per iteration (ms): 129624.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.719767E+00 | loss scale: 131072.0 | grad norm: 49193.514 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106523
+ time (ms)
106524
+ iteration 2516/ 292968 | consumed samples: 5152768 | consumed tokens: 666812416 | elapsed time per iteration (ms): 126860.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.729679E+00 | loss scale: 131072.0 | grad norm: 70559.533 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106525
+ time (ms)