Synthetic Data rewrite (model checkpoints)
Collection
Models trained with synthetic data generated using various synthetic rewrite methods
•
4 items
•
Updated
Torchtune logs
Step 1 | loss:1.3211456537246704 lr:1e-05 tokens_per_second_per_gpu:7769.9267578125 peak_memory_active:78.02358818054199 peak_memory_alloc:78.02358818054199 peak_memory_reserved:100.0546875
Step 2 | loss:1.335770845413208 lr:1e-05 tokens_per_second_per_gpu:9404.642578125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 3 | loss:1.294201135635376 lr:1e-05 tokens_per_second_per_gpu:9389.052734375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 4 | loss:1.3177554607391357 lr:1e-05 tokens_per_second_per_gpu:9378.2646484375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 5 | loss:1.2597243785858154 lr:1e-05 tokens_per_second_per_gpu:9347.7900390625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 6 | loss:1.288907766342163 lr:1e-05 tokens_per_second_per_gpu:9397.0537109375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 7 | loss:1.2272390127182007 lr:1e-05 tokens_per_second_per_gpu:1778.3231201171875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 8 | loss:1.251979112625122 lr:1e-05 tokens_per_second_per_gpu:9371.7138671875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 9 | loss:1.251136302947998 lr:1e-05 tokens_per_second_per_gpu:9392.8505859375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 10 | loss:1.2060770988464355 lr:1e-05 tokens_per_second_per_gpu:9335.2685546875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 11 | loss:1.1525814533233643 lr:1e-05 tokens_per_second_per_gpu:9302.8408203125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 12 | loss:1.2253873348236084 lr:1e-05 tokens_per_second_per_gpu:9354.08203125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 13 | loss:1.1733695268630981 lr:1e-05 tokens_per_second_per_gpu:1670.3487548828125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 14 | loss:1.1463207006454468 lr:1e-05 tokens_per_second_per_gpu:9348.9453125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 15 | loss:1.1960203647613525 lr:1e-05 tokens_per_second_per_gpu:9383.4306640625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 16 | loss:1.195420503616333 lr:1e-05 tokens_per_second_per_gpu:9370.5498046875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 17 | loss:1.1366651058197021 lr:1e-05 tokens_per_second_per_gpu:9367.509765625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 18 | loss:1.1670458316802979 lr:1e-05 tokens_per_second_per_gpu:9333.2626953125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 19 | loss:1.1989891529083252 lr:1e-05 tokens_per_second_per_gpu:1693.5833740234375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 20 | loss:1.1334538459777832 lr:1e-05 tokens_per_second_per_gpu:9375.34765625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 21 | loss:1.103496789932251 lr:1e-05 tokens_per_second_per_gpu:9364.8037109375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 22 | loss:1.1271919012069702 lr:1e-05 tokens_per_second_per_gpu:9313.0087890625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 23 | loss:1.153731346130371 lr:1e-05 tokens_per_second_per_gpu:9384.38671875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 24 | loss:1.1608662605285645 lr:1e-05 tokens_per_second_per_gpu:9375.9873046875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 25 | loss:1.129477858543396 lr:1e-05 tokens_per_second_per_gpu:1712.00927734375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 26 | loss:1.177829384803772 lr:1e-05 tokens_per_second_per_gpu:9370.0888671875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 27 | loss:1.0763046741485596 lr:1e-05 tokens_per_second_per_gpu:9331.28515625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 28 | loss:1.1317439079284668 lr:1e-05 tokens_per_second_per_gpu:9387.8037109375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 29 | loss:1.1050620079040527 lr:1e-05 tokens_per_second_per_gpu:9368.3818359375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 30 | loss:1.1052101850509644 lr:1e-05 tokens_per_second_per_gpu:9347.2958984375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 31 | loss:1.111124038696289 lr:1e-05 tokens_per_second_per_gpu:1719.0673828125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 32 | loss:1.111747145652771 lr:1e-05 tokens_per_second_per_gpu:9383.083984375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 33 | loss:1.0900179147720337 lr:1e-05 tokens_per_second_per_gpu:9364.4638671875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 34 | loss:1.1458609104156494 lr:1e-05 tokens_per_second_per_gpu:9375.796875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 35 | loss:1.0755043029785156 lr:1e-05 tokens_per_second_per_gpu:9294.5185546875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 36 | loss:1.0761905908584595 lr:1e-05 tokens_per_second_per_gpu:9326.791015625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 37 | loss:1.0957200527191162 lr:1e-05 tokens_per_second_per_gpu:1678.1534423828125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 38 | loss:1.0826027393341064 lr:1e-05 tokens_per_second_per_gpu:9378.7744140625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 39 | loss:1.0892133712768555 lr:1e-05 tokens_per_second_per_gpu:9329.59375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 40 | loss:1.0769106149673462 lr:1e-05 tokens_per_second_per_gpu:9374.8955078125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 41 | loss:1.0585637092590332 lr:1e-05 tokens_per_second_per_gpu:9343.6650390625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 42 | loss:1.0601545572280884 lr:1e-05 tokens_per_second_per_gpu:9350.439453125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 43 | loss:1.1050180196762085 lr:1e-05 tokens_per_second_per_gpu:1680.43701171875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 44 | loss:1.0823876857757568 lr:1e-05 tokens_per_second_per_gpu:9389.78515625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 45 | loss:1.0236235857009888 lr:1e-05 tokens_per_second_per_gpu:9324.6806640625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 46 | loss:1.063448429107666 lr:1e-05 tokens_per_second_per_gpu:9351.37890625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 47 | loss:1.0344215631484985 lr:1e-05 tokens_per_second_per_gpu:9329.3671875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 48 | loss:1.035452961921692 lr:1e-05 tokens_per_second_per_gpu:9286.9921875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 49 | loss:1.0618977546691895 lr:1e-05 tokens_per_second_per_gpu:1688.327880859375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 50 | loss:1.0246860980987549 lr:1e-05 tokens_per_second_per_gpu:9389.068359375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 51 | loss:1.0447665452957153 lr:1e-05 tokens_per_second_per_gpu:9332.359375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 52 | loss:1.050453543663025 lr:1e-05 tokens_per_second_per_gpu:9325.9892578125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 53 | loss:1.009639024734497 lr:1e-05 tokens_per_second_per_gpu:9334.9375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 54 | loss:1.037742257118225 lr:1e-05 tokens_per_second_per_gpu:9363.2197265625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 55 | loss:0.9976449012756348 lr:1e-05 tokens_per_second_per_gpu:1702.3411865234375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 56 | loss:1.059687852859497 lr:1e-05 tokens_per_second_per_gpu:9428.669921875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 57 | loss:1.047523021697998 lr:1e-05 tokens_per_second_per_gpu:9372.8974609375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 58 | loss:0.999847412109375 lr:1e-05 tokens_per_second_per_gpu:9323.560546875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 59 | loss:1.000404953956604 lr:1e-05 tokens_per_second_per_gpu:9347.69140625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 60 | loss:0.9948817491531372 lr:1e-05 tokens_per_second_per_gpu:9310.076171875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 61 | loss:0.9869050979614258 lr:1e-05 tokens_per_second_per_gpu:1695.3482666015625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 62 | loss:0.9814001321792603 lr:1e-05 tokens_per_second_per_gpu:9369.298828125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 63 | loss:0.9942070245742798 lr:1e-05 tokens_per_second_per_gpu:9338.666015625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 64 | loss:1.023196816444397 lr:1e-05 tokens_per_second_per_gpu:9342.8486328125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 65 | loss:0.9572965502738953 lr:1e-05 tokens_per_second_per_gpu:9319.27734375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 66 | loss:1.007226586341858 lr:1e-05 tokens_per_second_per_gpu:9357.8056640625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 67 | loss:1.0144879817962646 lr:1e-05 tokens_per_second_per_gpu:1703.5177001953125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 68 | loss:0.9142518043518066 lr:1e-05 tokens_per_second_per_gpu:9320.4384765625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 69 | loss:0.9788560271263123 lr:1e-05 tokens_per_second_per_gpu:9357.6650390625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 70 | loss:0.9135415554046631 lr:1e-05 tokens_per_second_per_gpu:9320.97265625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 71 | loss:0.9775234460830688 lr:1e-05 tokens_per_second_per_gpu:9352.4951171875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 72 | loss:0.971872091293335 lr:1e-05 tokens_per_second_per_gpu:9359.1728515625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 73 | loss:0.9191838502883911 lr:1e-05 tokens_per_second_per_gpu:1727.9901123046875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 74 | loss:0.9461423754692078 lr:1e-05 tokens_per_second_per_gpu:9357.9833984375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 75 | loss:0.9854860901832581 lr:1e-05 tokens_per_second_per_gpu:9372.177734375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 76 | loss:0.9368661642074585 lr:1e-05 tokens_per_second_per_gpu:9274.6591796875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 77 | loss:0.9223035573959351 lr:1e-05 tokens_per_second_per_gpu:9355.2080078125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 78 | loss:0.9389092326164246 lr:1e-05 tokens_per_second_per_gpu:9308.9150390625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 79 | loss:0.9053646326065063 lr:1e-05 tokens_per_second_per_gpu:1657.093017578125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 80 | loss:0.9161912798881531 lr:1e-05 tokens_per_second_per_gpu:9404.8662109375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 81 | loss:0.932512104511261 lr:1e-05 tokens_per_second_per_gpu:9376.876953125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 82 | loss:0.9379421472549438 lr:1e-05 tokens_per_second_per_gpu:9379.380859375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 83 | loss:0.9384360313415527 lr:1e-05 tokens_per_second_per_gpu:9320.111328125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 84 | loss:0.8984299302101135 lr:1e-05 tokens_per_second_per_gpu:9323.5693359375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 85 | loss:0.9065302610397339 lr:1e-05 tokens_per_second_per_gpu:1663.7227783203125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 86 | loss:0.8749620914459229 lr:1e-05 tokens_per_second_per_gpu:9364.18359375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 87 | loss:0.8991984128952026 lr:1e-05 tokens_per_second_per_gpu:9347.9931640625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 88 | loss:0.9098464250564575 lr:1e-05 tokens_per_second_per_gpu:9312.3896484375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 89 | loss:0.9177851676940918 lr:1e-05 tokens_per_second_per_gpu:9376.46484375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 90 | loss:0.8775016665458679 lr:1e-05 tokens_per_second_per_gpu:9301.6708984375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 91 | loss:0.8637601137161255 lr:1e-05 tokens_per_second_per_gpu:1675.364990234375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 92 | loss:0.8905321359634399 lr:1e-05 tokens_per_second_per_gpu:9338.9833984375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 93 | loss:0.8570657968521118 lr:1e-05 tokens_per_second_per_gpu:9370.9296875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 94 | loss:0.8662339448928833 lr:1e-05 tokens_per_second_per_gpu:9333.689453125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 95 | loss:0.8341716527938843 lr:1e-05 tokens_per_second_per_gpu:9361.34375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 96 | loss:0.8863069415092468 lr:1e-05 tokens_per_second_per_gpu:9386.87109375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 97 | loss:0.8376939296722412 lr:1e-05 tokens_per_second_per_gpu:1662.933349609375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 98 | loss:0.8456942439079285 lr:1e-05 tokens_per_second_per_gpu:9326.4150390625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 99 | loss:0.8556278944015503 lr:1e-05 tokens_per_second_per_gpu:9398.4443359375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 100 | loss:0.8346591591835022 lr:1e-05 tokens_per_second_per_gpu:9352.9921875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 101 | loss:0.8354505896568298 lr:1e-05 tokens_per_second_per_gpu:9330.9580078125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 102 | loss:0.8623504638671875 lr:1e-05 tokens_per_second_per_gpu:9404.7236328125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 103 | loss:0.8091994524002075 lr:1e-05 tokens_per_second_per_gpu:1670.3187255859375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 104 | loss:0.8150163292884827 lr:1e-05 tokens_per_second_per_gpu:9301.9541015625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 105 | loss:0.7894271612167358 lr:1e-05 tokens_per_second_per_gpu:9343.263671875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 106 | loss:0.8139293789863586 lr:1e-05 tokens_per_second_per_gpu:9321.525390625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 107 | loss:0.8028636574745178 lr:1e-05 tokens_per_second_per_gpu:9360.947265625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 108 | loss:0.7904955744743347 lr:1e-05 tokens_per_second_per_gpu:9356.5244140625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 109 | loss:0.777755618095398 lr:1e-05 tokens_per_second_per_gpu:1692.6571044921875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 110 | loss:0.7536676526069641 lr:1e-05 tokens_per_second_per_gpu:9394.283203125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 111 | loss:0.7751957178115845 lr:1e-05 tokens_per_second_per_gpu:9359.609375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 112 | loss:0.7875944375991821 lr:1e-05 tokens_per_second_per_gpu:9362.681640625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 113 | loss:0.7803712487220764 lr:1e-05 tokens_per_second_per_gpu:9325.1103515625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 114 | loss:0.7660679817199707 lr:1e-05 tokens_per_second_per_gpu:9295.2841796875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 115 | loss:0.7846901416778564 lr:1e-05 tokens_per_second_per_gpu:1686.7628173828125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 116 | loss:0.7381860017776489 lr:1e-05 tokens_per_second_per_gpu:9384.9755859375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 117 | loss:0.7317936420440674 lr:1e-05 tokens_per_second_per_gpu:9361.8505859375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 118 | loss:0.7241209745407104 lr:1e-05 tokens_per_second_per_gpu:9340.3046875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 119 | loss:0.7437127828598022 lr:1e-05 tokens_per_second_per_gpu:9341.1611328125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 120 | loss:0.730717658996582 lr:1e-05 tokens_per_second_per_gpu:9345.6142578125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 121 | loss:0.7170100212097168 lr:1e-05 tokens_per_second_per_gpu:1811.67578125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 122 | loss:0.7376323938369751 lr:1e-05 tokens_per_second_per_gpu:9346.654296875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 123 | loss:0.693085789680481 lr:1e-05 tokens_per_second_per_gpu:9336.3701171875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 124 | loss:0.7255758047103882 lr:1e-05 tokens_per_second_per_gpu:9376.4951171875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 125 | loss:0.6965959072113037 lr:1e-05 tokens_per_second_per_gpu:9335.8994140625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 126 | loss:0.6836423277854919 lr:1e-05 tokens_per_second_per_gpu:9276.42578125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 127 | loss:0.6677228808403015 lr:1e-05 tokens_per_second_per_gpu:1629.97607421875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 128 | loss:0.6728472709655762 lr:1e-05 tokens_per_second_per_gpu:9385.7353515625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 129 | loss:0.6854134798049927 lr:1e-05 tokens_per_second_per_gpu:9385.9013671875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 130 | loss:0.6719750761985779 lr:1e-05 tokens_per_second_per_gpu:9383.0595703125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 131 | loss:0.6750876903533936 lr:1e-05 tokens_per_second_per_gpu:9349.3017578125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 132 | loss:0.6597146987915039 lr:1e-05 tokens_per_second_per_gpu:9278.185546875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 133 | loss:0.628248393535614 lr:1e-05 tokens_per_second_per_gpu:1693.7158203125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 134 | loss:0.6396991014480591 lr:1e-05 tokens_per_second_per_gpu:9394.583984375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 135 | loss:0.6464598178863525 lr:1e-05 tokens_per_second_per_gpu:9365.94921875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 136 | loss:0.6155937910079956 lr:1e-05 tokens_per_second_per_gpu:9365.3564453125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 137 | loss:0.6267995834350586 lr:1e-05 tokens_per_second_per_gpu:9344.6904296875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 138 | loss:0.6431571841239929 lr:1e-05 tokens_per_second_per_gpu:9314.4990234375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 139 | loss:0.5968502759933472 lr:1e-05 tokens_per_second_per_gpu:1689.66064453125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 140 | loss:0.5865070223808289 lr:1e-05 tokens_per_second_per_gpu:9332.373046875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 141 | loss:0.5802854895591736 lr:1e-05 tokens_per_second_per_gpu:9342.28515625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 142 | loss:0.6153531670570374 lr:1e-05 tokens_per_second_per_gpu:9377.552734375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 143 | loss:0.5920319557189941 lr:1e-05 tokens_per_second_per_gpu:9391.2490234375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 144 | loss:0.6016256213188171 lr:1e-05 tokens_per_second_per_gpu:9337.443359375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 145 | loss:0.5596208572387695 lr:1e-05 tokens_per_second_per_gpu:1731.491455078125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 146 | loss:0.5697517991065979 lr:1e-05 tokens_per_second_per_gpu:9400.64453125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 147 | loss:0.55704665184021 lr:1e-05 tokens_per_second_per_gpu:9324.396484375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 148 | loss:0.5481194853782654 lr:1e-05 tokens_per_second_per_gpu:9348.1845703125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 149 | loss:0.5609490275382996 lr:1e-05 tokens_per_second_per_gpu:9346.6171875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 150 | loss:0.5486775636672974 lr:1e-05 tokens_per_second_per_gpu:9362.451171875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 151 | loss:0.5385841727256775 lr:1e-05 tokens_per_second_per_gpu:1671.3609619140625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 152 | loss:0.5144486427307129 lr:1e-05 tokens_per_second_per_gpu:9343.353515625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 153 | loss:0.5157741904258728 lr:1e-05 tokens_per_second_per_gpu:9394.021484375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 154 | loss:0.5207873582839966 lr:1e-05 tokens_per_second_per_gpu:9327.5107421875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 155 | loss:0.5190667510032654 lr:1e-05 tokens_per_second_per_gpu:9357.3603515625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 156 | loss:0.5029794573783875 lr:1e-05 tokens_per_second_per_gpu:9357.802734375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 157 | loss:0.5049657821655273 lr:1e-05 tokens_per_second_per_gpu:1697.7664794921875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 158 | loss:0.4844820499420166 lr:1e-05 tokens_per_second_per_gpu:9397.9375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 159 | loss:0.48047399520874023 lr:1e-05 tokens_per_second_per_gpu:9302.748046875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 160 | loss:0.48552030324935913 lr:1e-05 tokens_per_second_per_gpu:9343.3662109375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 161 | loss:0.46948087215423584 lr:1e-05 tokens_per_second_per_gpu:9318.3134765625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 162 | loss:0.44620171189308167 lr:1e-05 tokens_per_second_per_gpu:9363.6669921875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 163 | loss:0.4475747346878052 lr:1e-05 tokens_per_second_per_gpu:1637.71533203125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 164 | loss:0.43331053853034973 lr:1e-05 tokens_per_second_per_gpu:9366.7509765625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 165 | loss:0.42649805545806885 lr:1e-05 tokens_per_second_per_gpu:9325.240234375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 166 | loss:0.442301481962204 lr:1e-05 tokens_per_second_per_gpu:9369.5185546875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 167 | loss:0.45347073674201965 lr:1e-05 tokens_per_second_per_gpu:9369.99609375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 168 | loss:0.4268518388271332 lr:1e-05 tokens_per_second_per_gpu:9350.9521484375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 169 | loss:0.41078904271125793 lr:1e-05 tokens_per_second_per_gpu:1688.332763671875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 170 | loss:0.39554521441459656 lr:1e-05 tokens_per_second_per_gpu:9336.8740234375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 171 | loss:0.400980681180954 lr:1e-05 tokens_per_second_per_gpu:9344.09375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 172 | loss:0.39611637592315674 lr:1e-05 tokens_per_second_per_gpu:9370.2451171875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 173 | loss:0.38398614525794983 lr:1e-05 tokens_per_second_per_gpu:9382.13671875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 174 | loss:0.3976611793041229 lr:1e-05 tokens_per_second_per_gpu:9349.74609375 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 175 | loss:0.3624931573867798 lr:1e-05 tokens_per_second_per_gpu:1696.8697509765625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 176 | loss:0.36529481410980225 lr:1e-05 tokens_per_second_per_gpu:9400.1953125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 177 | loss:0.34399187564849854 lr:1e-05 tokens_per_second_per_gpu:9325.0126953125 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 178 | loss:0.3592681288719177 lr:1e-05 tokens_per_second_per_gpu:9362.7294921875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 179 | loss:0.3464800715446472 lr:1e-05 tokens_per_second_per_gpu:9363.1904296875 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Step 180 | loss:0.35559672117233276 lr:1e-05 tokens_per_second_per_gpu:9333.8369140625 peak_memory_active:85.53374147415161 peak_memory_alloc:85.53374147415161 peak_memory_reserved:104.0546875
Base model
meta-llama/Llama-3.1-8B