bigscience-bot
commited on
Commit
·
0d1dc9e
1
Parent(s):
4aed75d
new data
Browse files- logs/main_log.txt +92 -0
logs/main_log.txt
CHANGED
@@ -67181,3 +67181,95 @@ time (ms)
|
|
67181 |
time (ms)
|
67182 |
iteration 719/ 292968 | consumed samples: 1472512 | consumed tokens: 117604352 | elapsed time per iteration (ms): 75603.8 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67183 |
time (ms)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
67181 |
time (ms)
|
67182 |
iteration 719/ 292968 | consumed samples: 1472512 | consumed tokens: 117604352 | elapsed time per iteration (ms): 75603.8 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67183 |
time (ms)
|
67184 |
+
iteration 720/ 292968 | consumed samples: 1474560 | consumed tokens: 117800960 | elapsed time per iteration (ms): 77618.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67185 |
+
time (ms)
|
67186 |
+
iteration 721/ 292968 | consumed samples: 1476608 | consumed tokens: 117997568 | elapsed time per iteration (ms): 76350.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67187 |
+
time (ms)
|
67188 |
+
iteration 722/ 292968 | consumed samples: 1478656 | consumed tokens: 118194176 | elapsed time per iteration (ms): 75529.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67189 |
+
time (ms)
|
67190 |
+
iteration 723/ 292968 | consumed samples: 1480704 | consumed tokens: 118390784 | elapsed time per iteration (ms): 76634.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67191 |
+
time (ms)
|
67192 |
+
iteration 724/ 292968 | consumed samples: 1482752 | consumed tokens: 118587392 | elapsed time per iteration (ms): 76610.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67193 |
+
time (ms)
|
67194 |
+
iteration 725/ 292968 | consumed samples: 1484800 | consumed tokens: 118784000 | elapsed time per iteration (ms): 76137.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67195 |
+
time (ms)
|
67196 |
+
iteration 726/ 292968 | consumed samples: 1486848 | consumed tokens: 118996992 | elapsed time per iteration (ms): 78329.5 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67197 |
+
time (ms)
|
67198 |
+
iteration 727/ 292968 | consumed samples: 1488896 | consumed tokens: 119209984 | elapsed time per iteration (ms): 79337.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67199 |
+
time (ms)
|
67200 |
+
iteration 728/ 292968 | consumed samples: 1490944 | consumed tokens: 119422976 | elapsed time per iteration (ms): 77771.8 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67201 |
+
time (ms)
|
67202 |
+
iteration 729/ 292968 | consumed samples: 1492992 | consumed tokens: 119635968 | elapsed time per iteration (ms): 79374.8 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67203 |
+
time (ms)
|
67204 |
+
iteration 730/ 292968 | consumed samples: 1495040 | consumed tokens: 119848960 | elapsed time per iteration (ms): 78461.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67205 |
+
time (ms)
|
67206 |
+
iteration 731/ 292968 | consumed samples: 1497088 | consumed tokens: 120061952 | elapsed time per iteration (ms): 78942.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67207 |
+
time (ms)
|
67208 |
+
iteration 732/ 292968 | consumed samples: 1499136 | consumed tokens: 120274944 | elapsed time per iteration (ms): 79955.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67209 |
+
time (ms)
|
67210 |
+
iteration 733/ 292968 | consumed samples: 1501184 | consumed tokens: 120487936 | elapsed time per iteration (ms): 79427.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67211 |
+
time (ms)
|
67212 |
+
iteration 734/ 292968 | consumed samples: 1503232 | consumed tokens: 120700928 | elapsed time per iteration (ms): 79713.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67213 |
+
time (ms)
|
67214 |
+
iteration 735/ 292968 | consumed samples: 1505280 | consumed tokens: 120913920 | elapsed time per iteration (ms): 77863.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67215 |
+
time (ms)
|
67216 |
+
iteration 736/ 292968 | consumed samples: 1507328 | consumed tokens: 121126912 | elapsed time per iteration (ms): 78405.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67217 |
+
time (ms)
|
67218 |
+
iteration 737/ 292968 | consumed samples: 1509376 | consumed tokens: 121339904 | elapsed time per iteration (ms): 78191.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67219 |
+
time (ms)
|
67220 |
+
iteration 738/ 292968 | consumed samples: 1511424 | consumed tokens: 121552896 | elapsed time per iteration (ms): 77427.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67221 |
+
time (ms)
|
67222 |
+
iteration 739/ 292968 | consumed samples: 1513472 | consumed tokens: 121765888 | elapsed time per iteration (ms): 77339.1 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67223 |
+
time (ms)
|
67224 |
+
iteration 740/ 292968 | consumed samples: 1515520 | consumed tokens: 121978880 | elapsed time per iteration (ms): 77282.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67225 |
+
time (ms)
|
67226 |
+
iteration 741/ 292968 | consumed samples: 1517568 | consumed tokens: 122191872 | elapsed time per iteration (ms): 78543.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67227 |
+
time (ms)
|
67228 |
+
iteration 742/ 292968 | consumed samples: 1519616 | consumed tokens: 122404864 | elapsed time per iteration (ms): 78583.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67229 |
+
time (ms)
|
67230 |
+
iteration 743/ 292968 | consumed samples: 1521664 | consumed tokens: 122617856 | elapsed time per iteration (ms): 77734.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67231 |
+
time (ms)
|
67232 |
+
iteration 744/ 292968 | consumed samples: 1523712 | consumed tokens: 122830848 | elapsed time per iteration (ms): 78005.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67233 |
+
time (ms)
|
67234 |
+
iteration 745/ 292968 | consumed samples: 1525760 | consumed tokens: 123043840 | elapsed time per iteration (ms): 78154.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67235 |
+
time (ms)
|
67236 |
+
iteration 746/ 292968 | consumed samples: 1527808 | consumed tokens: 123256832 | elapsed time per iteration (ms): 79098.1 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67237 |
+
time (ms)
|
67238 |
+
iteration 747/ 292968 | consumed samples: 1529856 | consumed tokens: 123469824 | elapsed time per iteration (ms): 76901.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67239 |
+
time (ms)
|
67240 |
+
iteration 748/ 292968 | consumed samples: 1531904 | consumed tokens: 123682816 | elapsed time per iteration (ms): 78364.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67241 |
+
time (ms)
|
67242 |
+
iteration 749/ 292968 | consumed samples: 1533952 | consumed tokens: 123895808 | elapsed time per iteration (ms): 77745.9 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67243 |
+
time (ms)
|
67244 |
+
iteration 750/ 292968 | consumed samples: 1536000 | consumed tokens: 124108800 | elapsed time per iteration (ms): 76993.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67245 |
+
time (ms)
|
67246 |
+
iteration 751/ 292968 | consumed samples: 1538048 | consumed tokens: 124321792 | elapsed time per iteration (ms): 78065.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67247 |
+
time (ms)
|
67248 |
+
iteration 752/ 292968 | consumed samples: 1540096 | consumed tokens: 124534784 | elapsed time per iteration (ms): 78716.9 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67249 |
+
time (ms)
|
67250 |
+
iteration 753/ 292968 | consumed samples: 1542144 | consumed tokens: 124747776 | elapsed time per iteration (ms): 78297.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67251 |
+
time (ms)
|
67252 |
+
iteration 754/ 292968 | consumed samples: 1544192 | consumed tokens: 124960768 | elapsed time per iteration (ms): 81533.9 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67253 |
+
time (ms)
|
67254 |
+
iteration 755/ 292968 | consumed samples: 1546240 | consumed tokens: 125173760 | elapsed time per iteration (ms): 77260.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67255 |
+
time (ms)
|
67256 |
+
iteration 756/ 292968 | consumed samples: 1548288 | consumed tokens: 125386752 | elapsed time per iteration (ms): 77380.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67257 |
+
time (ms)
|
67258 |
+
iteration 757/ 292968 | consumed samples: 1550336 | consumed tokens: 125599744 | elapsed time per iteration (ms): 78639.9 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67259 |
+
time (ms)
|
67260 |
+
iteration 758/ 292968 | consumed samples: 1552384 | consumed tokens: 125812736 | elapsed time per iteration (ms): 78547.1 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67261 |
+
time (ms)
|
67262 |
+
iteration 759/ 292968 | consumed samples: 1554432 | consumed tokens: 126025728 | elapsed time per iteration (ms): 78637.1 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67263 |
+
time (ms)
|
67264 |
+
iteration 760/ 292968 | consumed samples: 1556480 | consumed tokens: 126238720 | elapsed time per iteration (ms): 76681.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67265 |
+
time (ms)
|
67266 |
+
iteration 761/ 292968 | consumed samples: 1558528 | consumed tokens: 126451712 | elapsed time per iteration (ms): 78835.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67267 |
+
time (ms)
|
67268 |
+
iteration 762/ 292968 | consumed samples: 1560576 | consumed tokens: 126664704 | elapsed time per iteration (ms): 78476.9 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67269 |
+
time (ms)
|
67270 |
+
iteration 763/ 292968 | consumed samples: 1562624 | consumed tokens: 126877696 | elapsed time per iteration (ms): 80815.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67271 |
+
time (ms)
|
67272 |
+
iteration 764/ 292968 | consumed samples: 1564672 | consumed tokens: 127090688 | elapsed time per iteration (ms): 78990.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67273 |
+
time (ms)
|
67274 |
+
iteration 765/ 292968 | consumed samples: 1566720 | consumed tokens: 127303680 | elapsed time per iteration (ms): 76814.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67275 |
+
time (ms)
|