bigscience-bot commited on
Commit
d0a856e
1 Parent(s): 9322457
Files changed (1) hide show
  1. logs/main_log.txt +92 -0
logs/main_log.txt CHANGED
@@ -67365,3 +67365,95 @@ time (ms)
67365
  time (ms)
67366
  iteration 811/ 292968 | consumed samples: 1660928 | consumed tokens: 137101312 | elapsed time per iteration (ms): 78796.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67367
  time (ms)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67365
  time (ms)
67366
  iteration 811/ 292968 | consumed samples: 1660928 | consumed tokens: 137101312 | elapsed time per iteration (ms): 78796.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67367
  time (ms)
67368
+ iteration 812/ 292968 | consumed samples: 1662976 | consumed tokens: 137314304 | elapsed time per iteration (ms): 76478.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67369
+ time (ms)
67370
+ iteration 813/ 292968 | consumed samples: 1665024 | consumed tokens: 137527296 | elapsed time per iteration (ms): 78875.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67371
+ time (ms)
67372
+ iteration 814/ 292968 | consumed samples: 1667072 | consumed tokens: 137740288 | elapsed time per iteration (ms): 77038.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67373
+ time (ms)
67374
+ iteration 815/ 292968 | consumed samples: 1669120 | consumed tokens: 137953280 | elapsed time per iteration (ms): 78966.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67375
+ time (ms)
67376
+ iteration 816/ 292968 | consumed samples: 1671168 | consumed tokens: 138166272 | elapsed time per iteration (ms): 78271.8 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67377
+ time (ms)
67378
+ iteration 817/ 292968 | consumed samples: 1673216 | consumed tokens: 138379264 | elapsed time per iteration (ms): 78760.9 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67379
+ time (ms)
67380
+ iteration 818/ 292968 | consumed samples: 1675264 | consumed tokens: 138592256 | elapsed time per iteration (ms): 80164.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67381
+ time (ms)
67382
+ iteration 819/ 292968 | consumed samples: 1677312 | consumed tokens: 138805248 | elapsed time per iteration (ms): 78758.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67383
+ time (ms)
67384
+ iteration 820/ 292968 | consumed samples: 1679360 | consumed tokens: 139018240 | elapsed time per iteration (ms): 80404.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67385
+ time (ms)
67386
+ iteration 821/ 292968 | consumed samples: 1681408 | consumed tokens: 139231232 | elapsed time per iteration (ms): 77913.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67387
+ time (ms)
67388
+ iteration 822/ 292968 | consumed samples: 1683456 | consumed tokens: 139444224 | elapsed time per iteration (ms): 77540.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67389
+ time (ms)
67390
+ iteration 823/ 292968 | consumed samples: 1685504 | consumed tokens: 139657216 | elapsed time per iteration (ms): 76602.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67391
+ time (ms)
67392
+ iteration 824/ 292968 | consumed samples: 1687552 | consumed tokens: 139870208 | elapsed time per iteration (ms): 77871.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67393
+ time (ms)
67394
+ iteration 825/ 292968 | consumed samples: 1689600 | consumed tokens: 140083200 | elapsed time per iteration (ms): 81554.1 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67395
+ time (ms)
67396
+ iteration 826/ 292968 | consumed samples: 1691648 | consumed tokens: 140296192 | elapsed time per iteration (ms): 77593.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67397
+ time (ms)
67398
+ iteration 827/ 292968 | consumed samples: 1693696 | consumed tokens: 140509184 | elapsed time per iteration (ms): 76966.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67399
+ time (ms)
67400
+ iteration 828/ 292968 | consumed samples: 1695744 | consumed tokens: 140722176 | elapsed time per iteration (ms): 78500.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67401
+ time (ms)
67402
+ iteration 829/ 292968 | consumed samples: 1697792 | consumed tokens: 140935168 | elapsed time per iteration (ms): 78281.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67403
+ time (ms)
67404
+ iteration 830/ 292968 | consumed samples: 1699840 | consumed tokens: 141148160 | elapsed time per iteration (ms): 76785.8 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67405
+ time (ms)
67406
+ iteration 831/ 292968 | consumed samples: 1701888 | consumed tokens: 141361152 | elapsed time per iteration (ms): 78291.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67407
+ time (ms)
67408
+ iteration 832/ 292968 | consumed samples: 1703936 | consumed tokens: 141574144 | elapsed time per iteration (ms): 77150.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67409
+ time (ms)
67410
+ iteration 833/ 292968 | consumed samples: 1705984 | consumed tokens: 141787136 | elapsed time per iteration (ms): 79163.8 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67411
+ time (ms)
67412
+ iteration 834/ 292968 | consumed samples: 1708032 | consumed tokens: 142000128 | elapsed time per iteration (ms): 80157.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67413
+ time (ms)
67414
+ iteration 835/ 292968 | consumed samples: 1710080 | consumed tokens: 142213120 | elapsed time per iteration (ms): 78440.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67415
+ time (ms)
67416
+ iteration 836/ 292968 | consumed samples: 1712128 | consumed tokens: 142426112 | elapsed time per iteration (ms): 76862.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67417
+ time (ms)
67418
+ iteration 837/ 292968 | consumed samples: 1714176 | consumed tokens: 142639104 | elapsed time per iteration (ms): 78281.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67419
+ time (ms)
67420
+ iteration 838/ 292968 | consumed samples: 1716224 | consumed tokens: 142852096 | elapsed time per iteration (ms): 78619.5 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67421
+ time (ms)
67422
+ iteration 839/ 292968 | consumed samples: 1718272 | consumed tokens: 143065088 | elapsed time per iteration (ms): 78310.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67423
+ time (ms)
67424
+ iteration 840/ 292968 | consumed samples: 1720320 | consumed tokens: 143278080 | elapsed time per iteration (ms): 78428.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67425
+ time (ms)
67426
+ iteration 841/ 292968 | consumed samples: 1722368 | consumed tokens: 143491072 | elapsed time per iteration (ms): 78459.9 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67427
+ time (ms)
67428
+ iteration 842/ 292968 | consumed samples: 1724416 | consumed tokens: 143704064 | elapsed time per iteration (ms): 79007.5 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67429
+ time (ms)
67430
+ iteration 843/ 292968 | consumed samples: 1726464 | consumed tokens: 143917056 | elapsed time per iteration (ms): 78188.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67431
+ time (ms)
67432
+ iteration 844/ 292968 | consumed samples: 1728512 | consumed tokens: 144130048 | elapsed time per iteration (ms): 79792.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67433
+ time (ms)
67434
+ iteration 845/ 292968 | consumed samples: 1730560 | consumed tokens: 144343040 | elapsed time per iteration (ms): 79053.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67435
+ time (ms)
67436
+ iteration 846/ 292968 | consumed samples: 1732608 | consumed tokens: 144556032 | elapsed time per iteration (ms): 77709.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67437
+ time (ms)
67438
+ iteration 847/ 292968 | consumed samples: 1734656 | consumed tokens: 144769024 | elapsed time per iteration (ms): 77030.1 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67439
+ time (ms)
67440
+ iteration 848/ 292968 | consumed samples: 1736704 | consumed tokens: 144982016 | elapsed time per iteration (ms): 78480.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67441
+ time (ms)
67442
+ iteration 849/ 292968 | consumed samples: 1738752 | consumed tokens: 145195008 | elapsed time per iteration (ms): 79274.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67443
+ time (ms)
67444
+ iteration 850/ 292968 | consumed samples: 1740800 | consumed tokens: 145408000 | elapsed time per iteration (ms): 78104.5 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67445
+ time (ms)
67446
+ iteration 851/ 292968 | consumed samples: 1742848 | consumed tokens: 145620992 | elapsed time per iteration (ms): 78348.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67447
+ time (ms)
67448
+ iteration 852/ 292968 | consumed samples: 1744896 | consumed tokens: 145833984 | elapsed time per iteration (ms): 78993.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67449
+ time (ms)
67450
+ iteration 853/ 292968 | consumed samples: 1746944 | consumed tokens: 146046976 | elapsed time per iteration (ms): 78849.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67451
+ time (ms)
67452
+ iteration 854/ 292968 | consumed samples: 1748992 | consumed tokens: 146259968 | elapsed time per iteration (ms): 78395.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67453
+ time (ms)
67454
+ iteration 855/ 292968 | consumed samples: 1751040 | consumed tokens: 146472960 | elapsed time per iteration (ms): 77359.8 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67455
+ time (ms)
67456
+ iteration 856/ 292968 | consumed samples: 1753088 | consumed tokens: 146685952 | elapsed time per iteration (ms): 79532.1 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67457
+ time (ms)
67458
+ iteration 857/ 292968 | consumed samples: 1755136 | consumed tokens: 146898944 | elapsed time per iteration (ms): 77728.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67459
+ time (ms)