bigscience-bot
commited on
Commit
•
eec5f68
1
Parent(s):
aefdfae
new data
Browse files- logs/main_log.txt +56 -0
logs/main_log.txt
CHANGED
@@ -106467,3 +106467,59 @@ time (ms)
|
|
106467 |
time (ms)
|
106468 |
iteration 2488/ 292968 | consumed samples: 5095424 | consumed tokens: 655343616 | elapsed time per iteration (ms): 127594.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.699123E+00 | loss scale: 65536.0 | grad norm: 20848.341 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106469 |
time (ms)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
106467 |
time (ms)
|
106468 |
iteration 2488/ 292968 | consumed samples: 5095424 | consumed tokens: 655343616 | elapsed time per iteration (ms): 127594.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.699123E+00 | loss scale: 65536.0 | grad norm: 20848.341 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106469 |
time (ms)
|
106470 |
+
iteration 2489/ 292968 | consumed samples: 5097472 | consumed tokens: 655753216 | elapsed time per iteration (ms): 125631.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.727104E+00 | loss scale: 65536.0 | grad norm: 25658.727 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106471 |
+
time (ms)
|
106472 |
+
iteration 2490/ 292968 | consumed samples: 5099520 | consumed tokens: 656162816 | elapsed time per iteration (ms): 127808.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.711399E+00 | loss scale: 65536.0 | grad norm: 24253.847 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106473 |
+
time (ms)
|
106474 |
+
iteration 2491/ 292968 | consumed samples: 5101568 | consumed tokens: 656572416 | elapsed time per iteration (ms): 127414.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.729037E+00 | loss scale: 65536.0 | grad norm: 22636.586 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106475 |
+
time (ms)
|
106476 |
+
iteration 2492/ 292968 | consumed samples: 5103616 | consumed tokens: 656982016 | elapsed time per iteration (ms): 126033.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.737489E+00 | loss scale: 65536.0 | grad norm: 20919.173 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106477 |
+
time (ms)
|
106478 |
+
iteration 2493/ 292968 | consumed samples: 5105664 | consumed tokens: 657391616 | elapsed time per iteration (ms): 127754.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.705382E+00 | loss scale: 65536.0 | grad norm: 19622.537 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106479 |
+
time (ms)
|
106480 |
+
iteration 2494/ 292968 | consumed samples: 5107712 | consumed tokens: 657801216 | elapsed time per iteration (ms): 125307.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.728056E+00 | loss scale: 65536.0 | grad norm: 27467.263 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106481 |
+
time (ms)
|
106482 |
+
iteration 2495/ 292968 | consumed samples: 5109760 | consumed tokens: 658210816 | elapsed time per iteration (ms): 126630.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.719722E+00 | loss scale: 65536.0 | grad norm: 34198.566 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106483 |
+
time (ms)
|
106484 |
+
iteration 2496/ 292968 | consumed samples: 5111808 | consumed tokens: 658620416 | elapsed time per iteration (ms): 128610.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.718237E+00 | loss scale: 65536.0 | grad norm: 35073.791 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106485 |
+
time (ms)
|
106486 |
+
iteration 2497/ 292968 | consumed samples: 5113856 | consumed tokens: 659030016 | elapsed time per iteration (ms): 126763.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.691556E+00 | loss scale: 65536.0 | grad norm: 29139.525 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106487 |
+
time (ms)
|
106488 |
+
iteration 2498/ 292968 | consumed samples: 5115904 | consumed tokens: 659439616 | elapsed time per iteration (ms): 125641.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.724890E+00 | loss scale: 65536.0 | grad norm: 23439.934 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106489 |
+
time (ms)
|
106490 |
+
iteration 2499/ 292968 | consumed samples: 5117952 | consumed tokens: 659849216 | elapsed time per iteration (ms): 127715.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.716745E+00 | loss scale: 65536.0 | grad norm: 19943.741 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106491 |
+
time (ms)
|
106492 |
+
iteration 2500/ 292968 | consumed samples: 5120000 | consumed tokens: 660258816 | elapsed time per iteration (ms): 128601.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.712626E+00 | loss scale: 131072.0 | grad norm: 20295.690 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106493 |
+
time (ms)
|
106494 |
+
iteration 2501/ 292968 | consumed samples: 5122048 | consumed tokens: 660668416 | elapsed time per iteration (ms): 131174.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.713100E+00 | loss scale: 131072.0 | grad norm: 36931.195 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106495 |
+
time (ms)
|
106496 |
+
iteration 2502/ 292968 | consumed samples: 5124096 | consumed tokens: 661078016 | elapsed time per iteration (ms): 131175.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.694582E+00 | loss scale: 131072.0 | grad norm: 49927.205 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106497 |
+
time (ms)
|
106498 |
+
iteration 2503/ 292968 | consumed samples: 5126144 | consumed tokens: 661487616 | elapsed time per iteration (ms): 129343.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.724925E+00 | loss scale: 131072.0 | grad norm: 60177.454 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106499 |
+
time (ms)
|
106500 |
+
iteration 2504/ 292968 | consumed samples: 5128192 | consumed tokens: 661897216 | elapsed time per iteration (ms): 127507.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.701005E+00 | loss scale: 131072.0 | grad norm: 50856.707 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106501 |
+
time (ms)
|
106502 |
+
iteration 2505/ 292968 | consumed samples: 5130240 | consumed tokens: 662306816 | elapsed time per iteration (ms): 129902.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.690974E+00 | loss scale: 131072.0 | grad norm: 53157.344 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106503 |
+
time (ms)
|
106504 |
+
iteration 2506/ 292968 | consumed samples: 5132288 | consumed tokens: 662716416 | elapsed time per iteration (ms): 128518.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.698850E+00 | loss scale: 131072.0 | grad norm: 54977.648 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106505 |
+
time (ms)
|
106506 |
+
iteration 2507/ 292968 | consumed samples: 5134336 | consumed tokens: 663126016 | elapsed time per iteration (ms): 127751.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.727875E+00 | loss scale: 131072.0 | grad norm: 59344.173 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106507 |
+
time (ms)
|
106508 |
+
iteration 2508/ 292968 | consumed samples: 5136384 | consumed tokens: 663535616 | elapsed time per iteration (ms): 130017.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.720293E+00 | loss scale: 131072.0 | grad norm: 45567.528 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106509 |
+
time (ms)
|
106510 |
+
iteration 2509/ 292968 | consumed samples: 5138432 | consumed tokens: 663945216 | elapsed time per iteration (ms): 131387.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.712830E+00 | loss scale: 131072.0 | grad norm: 41242.503 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106511 |
+
time (ms)
|
106512 |
+
iteration 2510/ 292968 | consumed samples: 5140480 | consumed tokens: 664354816 | elapsed time per iteration (ms): 131183.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.711005E+00 | loss scale: 131072.0 | grad norm: 49437.526 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106513 |
+
time (ms)
|
106514 |
+
iteration 2511/ 292968 | consumed samples: 5142528 | consumed tokens: 664764416 | elapsed time per iteration (ms): 134500.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.709109E+00 | loss scale: 131072.0 | grad norm: 55609.251 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106515 |
+
time (ms)
|
106516 |
+
iteration 2512/ 292968 | consumed samples: 5144576 | consumed tokens: 665174016 | elapsed time per iteration (ms): 135374.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.734447E+00 | loss scale: 131072.0 | grad norm: 43249.036 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106517 |
+
time (ms)
|
106518 |
+
iteration 2513/ 292968 | consumed samples: 5146624 | consumed tokens: 665583616 | elapsed time per iteration (ms): 130829.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.699287E+00 | loss scale: 131072.0 | grad norm: 35654.330 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106519 |
+
time (ms)
|
106520 |
+
iteration 2514/ 292968 | consumed samples: 5148672 | consumed tokens: 665993216 | elapsed time per iteration (ms): 133991.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.718992E+00 | loss scale: 131072.0 | grad norm: 37759.592 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106521 |
+
time (ms)
|
106522 |
+
iteration 2515/ 292968 | consumed samples: 5150720 | consumed tokens: 666402816 | elapsed time per iteration (ms): 129624.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.719767E+00 | loss scale: 131072.0 | grad norm: 49193.514 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106523 |
+
time (ms)
|
106524 |
+
iteration 2516/ 292968 | consumed samples: 5152768 | consumed tokens: 666812416 | elapsed time per iteration (ms): 126860.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.729679E+00 | loss scale: 131072.0 | grad norm: 70559.533 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
106525 |
+
time (ms)
|