bigscience-bot
commited on
Commit
•
0a86c3e
1
Parent(s):
18ed1f8
new data
Browse files- logs/main_log.txt +64 -0
logs/main_log.txt
CHANGED
@@ -124878,3 +124878,67 @@ time (ms)
|
|
124878 |
time (ms)
|
124879 |
iteration 3266/ 292968 | consumed samples: 6688768 | consumed tokens: 1003765760 | elapsed time per iteration (ms): 111087.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.498115E+00 | loss scale: 131072.0 | grad norm: 40484.037 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124880 |
time (ms)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
124878 |
time (ms)
|
124879 |
iteration 3266/ 292968 | consumed samples: 6688768 | consumed tokens: 1003765760 | elapsed time per iteration (ms): 111087.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.498115E+00 | loss scale: 131072.0 | grad norm: 40484.037 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124880 |
time (ms)
|
124881 |
+
iteration 3267/ 292968 | consumed samples: 6690816 | consumed tokens: 1004257280 | elapsed time per iteration (ms): 112981.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.485603E+00 | loss scale: 131072.0 | grad norm: 35172.870 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124882 |
+
time (ms)
|
124883 |
+
iteration 3268/ 292968 | consumed samples: 6692864 | consumed tokens: 1004748800 | elapsed time per iteration (ms): 112046.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.503227E+00 | loss scale: 131072.0 | grad norm: 36791.981 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124884 |
+
time (ms)
|
124885 |
+
iteration 3269/ 292968 | consumed samples: 6694912 | consumed tokens: 1005240320 | elapsed time per iteration (ms): 110197.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.492297E+00 | loss scale: 131072.0 | grad norm: 39721.467 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124886 |
+
time (ms)
|
124887 |
+
iteration 3270/ 292968 | consumed samples: 6696960 | consumed tokens: 1005731840 | elapsed time per iteration (ms): 110041.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.465833E+00 | loss scale: 131072.0 | grad norm: 41592.190 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124888 |
+
time (ms)
|
124889 |
+
iteration 3271/ 292968 | consumed samples: 6699008 | consumed tokens: 1006223360 | elapsed time per iteration (ms): 110297.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.520051E+00 | loss scale: 131072.0 | grad norm: 38770.837 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124890 |
+
time (ms)
|
124891 |
+
iteration 3272/ 292968 | consumed samples: 6701056 | consumed tokens: 1006714880 | elapsed time per iteration (ms): 113682.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.532229E+00 | loss scale: 131072.0 | grad norm: 46863.674 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124892 |
+
time (ms)
|
124893 |
+
iteration 3273/ 292968 | consumed samples: 6703104 | consumed tokens: 1007206400 | elapsed time per iteration (ms): 115764.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.487801E+00 | loss scale: 131072.0 | grad norm: 47275.617 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124894 |
+
time (ms)
|
124895 |
+
iteration 3274/ 292968 | consumed samples: 6705152 | consumed tokens: 1007697920 | elapsed time per iteration (ms): 113611.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.499582E+00 | loss scale: 131072.0 | grad norm: 43028.621 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124896 |
+
time (ms)
|
124897 |
+
iteration 3275/ 292968 | consumed samples: 6707200 | consumed tokens: 1008189440 | elapsed time per iteration (ms): 111135.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.499293E+00 | loss scale: 131072.0 | grad norm: 43217.821 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124898 |
+
time (ms)
|
124899 |
+
iteration 3276/ 292968 | consumed samples: 6709248 | consumed tokens: 1008680960 | elapsed time per iteration (ms): 111398.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.495284E+00 | loss scale: 131072.0 | grad norm: 35376.715 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124900 |
+
time (ms)
|
124901 |
+
iteration 3277/ 292968 | consumed samples: 6711296 | consumed tokens: 1009172480 | elapsed time per iteration (ms): 112414.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.483550E+00 | loss scale: 131072.0 | grad norm: 34250.645 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124902 |
+
time (ms)
|
124903 |
+
iteration 3278/ 292968 | consumed samples: 6713344 | consumed tokens: 1009664000 | elapsed time per iteration (ms): 111344.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.486138E+00 | loss scale: 131072.0 | grad norm: 30434.955 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124904 |
+
time (ms)
|
124905 |
+
iteration 3279/ 292968 | consumed samples: 6715392 | consumed tokens: 1010155520 | elapsed time per iteration (ms): 112060.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.498874E+00 | loss scale: 131072.0 | grad norm: 29348.389 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124906 |
+
time (ms)
|
124907 |
+
iteration 3280/ 292968 | consumed samples: 6717440 | consumed tokens: 1010647040 | elapsed time per iteration (ms): 112820.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.497196E+00 | loss scale: 131072.0 | grad norm: 29673.133 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124908 |
+
time (ms)
|
124909 |
+
iteration 3281/ 292968 | consumed samples: 6719488 | consumed tokens: 1011138560 | elapsed time per iteration (ms): 111234.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.499080E+00 | loss scale: 131072.0 | grad norm: 40415.963 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124910 |
+
time (ms)
|
124911 |
+
iteration 3282/ 292968 | consumed samples: 6721536 | consumed tokens: 1011630080 | elapsed time per iteration (ms): 111552.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.491541E+00 | loss scale: 131072.0 | grad norm: 57029.381 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124912 |
+
time (ms)
|
124913 |
+
iteration 3283/ 292968 | consumed samples: 6723584 | consumed tokens: 1012121600 | elapsed time per iteration (ms): 112426.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.497360E+00 | loss scale: 131072.0 | grad norm: 59242.468 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124914 |
+
time (ms)
|
124915 |
+
iteration 3284/ 292968 | consumed samples: 6725632 | consumed tokens: 1012613120 | elapsed time per iteration (ms): 111149.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.476556E+00 | loss scale: 131072.0 | grad norm: 45191.526 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124916 |
+
time (ms)
|
124917 |
+
iteration 3285/ 292968 | consumed samples: 6727680 | consumed tokens: 1013104640 | elapsed time per iteration (ms): 113840.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.492275E+00 | loss scale: 131072.0 | grad norm: 36899.796 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124918 |
+
time (ms)
|
124919 |
+
iteration 3286/ 292968 | consumed samples: 6729728 | consumed tokens: 1013596160 | elapsed time per iteration (ms): 113981.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.471767E+00 | loss scale: 131072.0 | grad norm: 42014.104 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124920 |
+
time (ms)
|
124921 |
+
iteration 3287/ 292968 | consumed samples: 6731776 | consumed tokens: 1014087680 | elapsed time per iteration (ms): 113840.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.475223E+00 | loss scale: 131072.0 | grad norm: 45709.099 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124922 |
+
time (ms)
|
124923 |
+
iteration 3288/ 292968 | consumed samples: 6733824 | consumed tokens: 1014579200 | elapsed time per iteration (ms): 112154.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.503000E+00 | loss scale: 131072.0 | grad norm: 46516.672 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124924 |
+
time (ms)
|
124925 |
+
iteration 3289/ 292968 | consumed samples: 6735872 | consumed tokens: 1015070720 | elapsed time per iteration (ms): 110548.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.484241E+00 | loss scale: 131072.0 | grad norm: 37206.769 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124926 |
+
time (ms)
|
124927 |
+
iteration 3290/ 292968 | consumed samples: 6737920 | consumed tokens: 1015562240 | elapsed time per iteration (ms): 112012.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.478825E+00 | loss scale: 131072.0 | grad norm: 39774.517 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124928 |
+
time (ms)
|
124929 |
+
iteration 3291/ 292968 | consumed samples: 6739968 | consumed tokens: 1016053760 | elapsed time per iteration (ms): 110410.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.495184E+00 | loss scale: 131072.0 | grad norm: 38254.934 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124930 |
+
time (ms)
|
124931 |
+
iteration 3292/ 292968 | consumed samples: 6742016 | consumed tokens: 1016545280 | elapsed time per iteration (ms): 111588.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.488030E+00 | loss scale: 131072.0 | grad norm: 43122.399 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124932 |
+
time (ms)
|
124933 |
+
iteration 3293/ 292968 | consumed samples: 6744064 | consumed tokens: 1017036800 | elapsed time per iteration (ms): 110742.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.512937E+00 | loss scale: 131072.0 | grad norm: 42031.635 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124934 |
+
time (ms)
|
124935 |
+
iteration 3294/ 292968 | consumed samples: 6746112 | consumed tokens: 1017528320 | elapsed time per iteration (ms): 112447.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.472189E+00 | loss scale: 131072.0 | grad norm: 44968.571 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124936 |
+
time (ms)
|
124937 |
+
iteration 3295/ 292968 | consumed samples: 6748160 | consumed tokens: 1018019840 | elapsed time per iteration (ms): 111572.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.486163E+00 | loss scale: 131072.0 | grad norm: 46456.832 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124938 |
+
time (ms)
|
124939 |
+
iteration 3296/ 292968 | consumed samples: 6750208 | consumed tokens: 1018511360 | elapsed time per iteration (ms): 112407.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.476424E+00 | loss scale: 131072.0 | grad norm: 36053.245 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124940 |
+
time (ms)
|
124941 |
+
iteration 3297/ 292968 | consumed samples: 6752256 | consumed tokens: 1019002880 | elapsed time per iteration (ms): 111913.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.471766E+00 | loss scale: 131072.0 | grad norm: 44322.924 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124942 |
+
time (ms)
|
124943 |
+
iteration 3298/ 292968 | consumed samples: 6754304 | consumed tokens: 1019494400 | elapsed time per iteration (ms): 112625.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.473320E+00 | loss scale: 131072.0 | grad norm: 50050.388 | num zeros: 0.0 | curriculum seqlen: 240 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
124944 |
+
time (ms)
|