bigscience-bot commited on
Commit
0d1dc9e
·
1 Parent(s): 4aed75d
Files changed (1) hide show
  1. logs/main_log.txt +92 -0
logs/main_log.txt CHANGED
@@ -67181,3 +67181,95 @@ time (ms)
67181
  time (ms)
67182
  iteration 719/ 292968 | consumed samples: 1472512 | consumed tokens: 117604352 | elapsed time per iteration (ms): 75603.8 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67183
  time (ms)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67181
  time (ms)
67182
  iteration 719/ 292968 | consumed samples: 1472512 | consumed tokens: 117604352 | elapsed time per iteration (ms): 75603.8 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67183
  time (ms)
67184
+ iteration 720/ 292968 | consumed samples: 1474560 | consumed tokens: 117800960 | elapsed time per iteration (ms): 77618.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67185
+ time (ms)
67186
+ iteration 721/ 292968 | consumed samples: 1476608 | consumed tokens: 117997568 | elapsed time per iteration (ms): 76350.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67187
+ time (ms)
67188
+ iteration 722/ 292968 | consumed samples: 1478656 | consumed tokens: 118194176 | elapsed time per iteration (ms): 75529.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67189
+ time (ms)
67190
+ iteration 723/ 292968 | consumed samples: 1480704 | consumed tokens: 118390784 | elapsed time per iteration (ms): 76634.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67191
+ time (ms)
67192
+ iteration 724/ 292968 | consumed samples: 1482752 | consumed tokens: 118587392 | elapsed time per iteration (ms): 76610.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67193
+ time (ms)
67194
+ iteration 725/ 292968 | consumed samples: 1484800 | consumed tokens: 118784000 | elapsed time per iteration (ms): 76137.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 96 | number of skipped iterations: 0 | number of nan iterations: 0 |
67195
+ time (ms)
67196
+ iteration 726/ 292968 | consumed samples: 1486848 | consumed tokens: 118996992 | elapsed time per iteration (ms): 78329.5 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67197
+ time (ms)
67198
+ iteration 727/ 292968 | consumed samples: 1488896 | consumed tokens: 119209984 | elapsed time per iteration (ms): 79337.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67199
+ time (ms)
67200
+ iteration 728/ 292968 | consumed samples: 1490944 | consumed tokens: 119422976 | elapsed time per iteration (ms): 77771.8 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67201
+ time (ms)
67202
+ iteration 729/ 292968 | consumed samples: 1492992 | consumed tokens: 119635968 | elapsed time per iteration (ms): 79374.8 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67203
+ time (ms)
67204
+ iteration 730/ 292968 | consumed samples: 1495040 | consumed tokens: 119848960 | elapsed time per iteration (ms): 78461.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67205
+ time (ms)
67206
+ iteration 731/ 292968 | consumed samples: 1497088 | consumed tokens: 120061952 | elapsed time per iteration (ms): 78942.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67207
+ time (ms)
67208
+ iteration 732/ 292968 | consumed samples: 1499136 | consumed tokens: 120274944 | elapsed time per iteration (ms): 79955.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67209
+ time (ms)
67210
+ iteration 733/ 292968 | consumed samples: 1501184 | consumed tokens: 120487936 | elapsed time per iteration (ms): 79427.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67211
+ time (ms)
67212
+ iteration 734/ 292968 | consumed samples: 1503232 | consumed tokens: 120700928 | elapsed time per iteration (ms): 79713.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67213
+ time (ms)
67214
+ iteration 735/ 292968 | consumed samples: 1505280 | consumed tokens: 120913920 | elapsed time per iteration (ms): 77863.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67215
+ time (ms)
67216
+ iteration 736/ 292968 | consumed samples: 1507328 | consumed tokens: 121126912 | elapsed time per iteration (ms): 78405.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67217
+ time (ms)
67218
+ iteration 737/ 292968 | consumed samples: 1509376 | consumed tokens: 121339904 | elapsed time per iteration (ms): 78191.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67219
+ time (ms)
67220
+ iteration 738/ 292968 | consumed samples: 1511424 | consumed tokens: 121552896 | elapsed time per iteration (ms): 77427.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67221
+ time (ms)
67222
+ iteration 739/ 292968 | consumed samples: 1513472 | consumed tokens: 121765888 | elapsed time per iteration (ms): 77339.1 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67223
+ time (ms)
67224
+ iteration 740/ 292968 | consumed samples: 1515520 | consumed tokens: 121978880 | elapsed time per iteration (ms): 77282.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67225
+ time (ms)
67226
+ iteration 741/ 292968 | consumed samples: 1517568 | consumed tokens: 122191872 | elapsed time per iteration (ms): 78543.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67227
+ time (ms)
67228
+ iteration 742/ 292968 | consumed samples: 1519616 | consumed tokens: 122404864 | elapsed time per iteration (ms): 78583.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67229
+ time (ms)
67230
+ iteration 743/ 292968 | consumed samples: 1521664 | consumed tokens: 122617856 | elapsed time per iteration (ms): 77734.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67231
+ time (ms)
67232
+ iteration 744/ 292968 | consumed samples: 1523712 | consumed tokens: 122830848 | elapsed time per iteration (ms): 78005.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67233
+ time (ms)
67234
+ iteration 745/ 292968 | consumed samples: 1525760 | consumed tokens: 123043840 | elapsed time per iteration (ms): 78154.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67235
+ time (ms)
67236
+ iteration 746/ 292968 | consumed samples: 1527808 | consumed tokens: 123256832 | elapsed time per iteration (ms): 79098.1 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67237
+ time (ms)
67238
+ iteration 747/ 292968 | consumed samples: 1529856 | consumed tokens: 123469824 | elapsed time per iteration (ms): 76901.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67239
+ time (ms)
67240
+ iteration 748/ 292968 | consumed samples: 1531904 | consumed tokens: 123682816 | elapsed time per iteration (ms): 78364.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67241
+ time (ms)
67242
+ iteration 749/ 292968 | consumed samples: 1533952 | consumed tokens: 123895808 | elapsed time per iteration (ms): 77745.9 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67243
+ time (ms)
67244
+ iteration 750/ 292968 | consumed samples: 1536000 | consumed tokens: 124108800 | elapsed time per iteration (ms): 76993.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67245
+ time (ms)
67246
+ iteration 751/ 292968 | consumed samples: 1538048 | consumed tokens: 124321792 | elapsed time per iteration (ms): 78065.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67247
+ time (ms)
67248
+ iteration 752/ 292968 | consumed samples: 1540096 | consumed tokens: 124534784 | elapsed time per iteration (ms): 78716.9 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67249
+ time (ms)
67250
+ iteration 753/ 292968 | consumed samples: 1542144 | consumed tokens: 124747776 | elapsed time per iteration (ms): 78297.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67251
+ time (ms)
67252
+ iteration 754/ 292968 | consumed samples: 1544192 | consumed tokens: 124960768 | elapsed time per iteration (ms): 81533.9 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67253
+ time (ms)
67254
+ iteration 755/ 292968 | consumed samples: 1546240 | consumed tokens: 125173760 | elapsed time per iteration (ms): 77260.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67255
+ time (ms)
67256
+ iteration 756/ 292968 | consumed samples: 1548288 | consumed tokens: 125386752 | elapsed time per iteration (ms): 77380.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67257
+ time (ms)
67258
+ iteration 757/ 292968 | consumed samples: 1550336 | consumed tokens: 125599744 | elapsed time per iteration (ms): 78639.9 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67259
+ time (ms)
67260
+ iteration 758/ 292968 | consumed samples: 1552384 | consumed tokens: 125812736 | elapsed time per iteration (ms): 78547.1 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67261
+ time (ms)
67262
+ iteration 759/ 292968 | consumed samples: 1554432 | consumed tokens: 126025728 | elapsed time per iteration (ms): 78637.1 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67263
+ time (ms)
67264
+ iteration 760/ 292968 | consumed samples: 1556480 | consumed tokens: 126238720 | elapsed time per iteration (ms): 76681.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67265
+ time (ms)
67266
+ iteration 761/ 292968 | consumed samples: 1558528 | consumed tokens: 126451712 | elapsed time per iteration (ms): 78835.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67267
+ time (ms)
67268
+ iteration 762/ 292968 | consumed samples: 1560576 | consumed tokens: 126664704 | elapsed time per iteration (ms): 78476.9 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67269
+ time (ms)
67270
+ iteration 763/ 292968 | consumed samples: 1562624 | consumed tokens: 126877696 | elapsed time per iteration (ms): 80815.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67271
+ time (ms)
67272
+ iteration 764/ 292968 | consumed samples: 1564672 | consumed tokens: 127090688 | elapsed time per iteration (ms): 78990.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67273
+ time (ms)
67274
+ iteration 765/ 292968 | consumed samples: 1566720 | consumed tokens: 127303680 | elapsed time per iteration (ms): 76814.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67275
+ time (ms)