bigscience-bot commited on
Commit
da17ed3
1 Parent(s): 068fb88
Files changed (1) hide show
  1. logs/main_log.txt +54 -0
logs/main_log.txt CHANGED
@@ -106523,3 +106523,57 @@ time (ms)
106523
  time (ms)
106524
  iteration 2516/ 292968 | consumed samples: 5152768 | consumed tokens: 666812416 | elapsed time per iteration (ms): 126860.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.729679E+00 | loss scale: 131072.0 | grad norm: 70559.533 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106525
  time (ms)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
106523
  time (ms)
106524
  iteration 2516/ 292968 | consumed samples: 5152768 | consumed tokens: 666812416 | elapsed time per iteration (ms): 126860.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.729679E+00 | loss scale: 131072.0 | grad norm: 70559.533 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106525
  time (ms)
106526
+ iteration 2517/ 292968 | consumed samples: 5154816 | consumed tokens: 667222016 | elapsed time per iteration (ms): 131649.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.677740E+00 | loss scale: 131072.0 | grad norm: 56023.485 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106527
+ time (ms)
106528
+ iteration 2518/ 292968 | consumed samples: 5156864 | consumed tokens: 667631616 | elapsed time per iteration (ms): 133698.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.696238E+00 | loss scale: 131072.0 | grad norm: 57083.392 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106529
+ time (ms)
106530
+ iteration 2519/ 292968 | consumed samples: 5158912 | consumed tokens: 668041216 | elapsed time per iteration (ms): 133130.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.712332E+00 | loss scale: 131072.0 | grad norm: 66522.220 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106531
+ time (ms)
106532
+ iteration 2520/ 292968 | consumed samples: 5160960 | consumed tokens: 668450816 | elapsed time per iteration (ms): 132776.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.699082E+00 | loss scale: 131072.0 | grad norm: 52981.553 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106533
+ time (ms)
106534
+ iteration 2521/ 292968 | consumed samples: 5163008 | consumed tokens: 668860416 | elapsed time per iteration (ms): 133609.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.677561E+00 | loss scale: 131072.0 | grad norm: 49201.207 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106535
+ time (ms)
106536
+ iteration 2522/ 292968 | consumed samples: 5165056 | consumed tokens: 669270016 | elapsed time per iteration (ms): 134264.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.699126E+00 | loss scale: 131072.0 | grad norm: 38187.609 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106537
+ time (ms)
106538
+ iteration 2523/ 292968 | consumed samples: 5167104 | consumed tokens: 669679616 | elapsed time per iteration (ms): 133050.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.711785E+00 | loss scale: 131072.0 | grad norm: 50523.507 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106539
+ time (ms)
106540
+ iteration 2524/ 292968 | consumed samples: 5169152 | consumed tokens: 670089216 | elapsed time per iteration (ms): 129836.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.695742E+00 | loss scale: 131072.0 | grad norm: 54330.129 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106541
+ time (ms)
106542
+ iteration 2525/ 292968 | consumed samples: 5171200 | consumed tokens: 670498816 | elapsed time per iteration (ms): 136356.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.732819E+00 | loss scale: 131072.0 | grad norm: 39968.544 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106543
+ time (ms)
106544
+ iteration 2526/ 292968 | consumed samples: 5173248 | consumed tokens: 670908416 | elapsed time per iteration (ms): 134571.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.712886E+00 | loss scale: 131072.0 | grad norm: 51363.977 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106545
+ time (ms)
106546
+ iteration 2527/ 292968 | consumed samples: 5175296 | consumed tokens: 671318016 | elapsed time per iteration (ms): 132047.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.695562E+00 | loss scale: 131072.0 | grad norm: 51765.676 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106547
+ time (ms)
106548
+ iteration 2528/ 292968 | consumed samples: 5177344 | consumed tokens: 671727616 | elapsed time per iteration (ms): 134158.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.707468E+00 | loss scale: 131072.0 | grad norm: 54323.308 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106549
+ time (ms)
106550
+ iteration 2529/ 292968 | consumed samples: 5179392 | consumed tokens: 672137216 | elapsed time per iteration (ms): 131022.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.695577E+00 | loss scale: 131072.0 | grad norm: 41546.541 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106551
+ time (ms)
106552
+ iteration 2530/ 292968 | consumed samples: 5181440 | consumed tokens: 672546816 | elapsed time per iteration (ms): 137329.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.701566E+00 | loss scale: 131072.0 | grad norm: 42285.909 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106553
+ time (ms)
106554
+ iteration 2531/ 292968 | consumed samples: 5183488 | consumed tokens: 672956416 | elapsed time per iteration (ms): 135951.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.682348E+00 | loss scale: 131072.0 | grad norm: 55894.421 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106555
+ time (ms)
106556
+ iteration 2532/ 292968 | consumed samples: 5185536 | consumed tokens: 673366016 | elapsed time per iteration (ms): 134684.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.719293E+00 | loss scale: 131072.0 | grad norm: 64429.092 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106557
+ time (ms)
106558
+ iteration 2533/ 292968 | consumed samples: 5187584 | consumed tokens: 673775616 | elapsed time per iteration (ms): 139215.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.718337E+00 | loss scale: 131072.0 | grad norm: 49058.682 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106559
+ time (ms)
106560
+ iteration 2534/ 292968 | consumed samples: 5189632 | consumed tokens: 674185216 | elapsed time per iteration (ms): 140178.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.713611E+00 | loss scale: 131072.0 | grad norm: 66713.209 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106561
+ time (ms)
106562
+ iteration 2535/ 292968 | consumed samples: 5191680 | consumed tokens: 674594816 | elapsed time per iteration (ms): 137068.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.720226E+00 | loss scale: 131072.0 | grad norm: 70072.153 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106563
+ time (ms)
106564
+ iteration 2536/ 292968 | consumed samples: 5193728 | consumed tokens: 675004416 | elapsed time per iteration (ms): 133750.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.686594E+00 | loss scale: 131072.0 | grad norm: 47463.962 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106565
+ time (ms)
106566
+ iteration 2537/ 292968 | consumed samples: 5195776 | consumed tokens: 675414016 | elapsed time per iteration (ms): 134502.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.718798E+00 | loss scale: 131072.0 | grad norm: 75553.129 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106567
+ time (ms)
106568
+ iteration 2538/ 292968 | consumed samples: 5197824 | consumed tokens: 675823616 | elapsed time per iteration (ms): 132873.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.676980E+00 | loss scale: 131072.0 | grad norm: 72938.459 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106569
+ time (ms)
106570
+ iteration 2539/ 292968 | consumed samples: 5199872 | consumed tokens: 676233216 | elapsed time per iteration (ms): 136842.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.690705E+00 | loss scale: 131072.0 | grad norm: 63805.103 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106571
+ time (ms)
106572
+ iteration 2540/ 292968 | consumed samples: 5201920 | consumed tokens: 676642816 | elapsed time per iteration (ms): 138332.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.699246E+00 | loss scale: 131072.0 | grad norm: 60131.574 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106573
+ time (ms)
106574
+ iteration 2541/ 292968 | consumed samples: 5203968 | consumed tokens: 677052416 | elapsed time per iteration (ms): 137209.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.687837E+00 | loss scale: 131072.0 | grad norm: 57555.686 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106575
+ time (ms)
106576
+ iteration 2542/ 292968 | consumed samples: 5206016 | consumed tokens: 677462016 | elapsed time per iteration (ms): 135834.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.714507E+00 | loss scale: 131072.0 | grad norm: 56971.731 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106577
+ time (ms)
106578
+ iteration 2543/ 292968 | consumed samples: 5208064 | consumed tokens: 677871616 | elapsed time per iteration (ms): 133073.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.731538E+00 | loss scale: 131072.0 | grad norm: 53881.397 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106579
+ time (ms)