--- license: gemma base_model: google/gemma-2-2b tags: - trl - sft - generated_from_trainer model-index: - name: collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd0 results: [] --- # collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd0 This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 1.1144 - Num Input Tokens Seen: 63047896 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 8 - eval_batch_size: 16 - seed: 0 - gradient_accumulation_steps: 16 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |:-------------:|:------:|:----:|:---------------:|:-----------------:| | No log | 0 | 0 | 1.3956 | 0 | | 1.6564 | 0.0043 | 5 | 1.3937 | 275648 | | 1.625 | 0.0086 | 10 | 1.3764 | 549096 | | 1.6127 | 0.0128 | 15 | 1.3400 | 816216 | | 1.4661 | 0.0171 | 20 | 1.2820 | 1085976 | | 1.4385 | 0.0214 | 25 | 1.2418 | 1354672 | | 1.4354 | 0.0257 | 30 | 1.2025 | 1622592 | | 1.3057 | 0.0299 | 35 | 1.1783 | 1881872 | | 1.2292 | 0.0342 | 40 | 1.1762 | 2153032 | | 1.1657 | 0.0385 | 45 | 1.1750 | 2421888 | | 1.0488 | 0.0428 | 50 | 1.1851 | 2689960 | | 0.9864 | 0.0471 | 55 | 1.1953 | 2949744 | | 0.8158 | 0.0513 | 60 | 1.2533 | 3220936 | | 0.688 | 0.0556 | 65 | 1.2690 | 3491824 | | 0.7325 | 0.0599 | 70 | 1.2536 | 3769288 | | 0.5427 | 0.0642 | 75 | 1.2794 | 4039720 | | 0.6183 | 0.0684 | 80 | 1.2494 | 4306072 | | 0.4428 | 0.0727 | 85 | 1.2401 | 4573368 | | 0.4545 | 0.0770 | 90 | 1.2470 | 4842648 | | 0.4023 | 0.0813 | 95 | 1.2484 | 5114144 | | 0.5039 | 0.0856 | 100 | 1.2291 | 5384112 | | 0.3227 | 0.0898 | 105 | 1.2389 | 5656336 | | 0.3485 | 0.0941 | 110 | 1.2460 | 5921680 | | 0.3794 | 0.0984 | 115 | 1.2286 | 6189560 | | 0.2328 | 0.1027 | 120 | 1.2411 | 6461112 | | 0.3787 | 0.1069 | 125 | 1.2257 | 6724808 | | 0.3868 | 0.1112 | 130 | 1.2166 | 6996328 | | 0.3563 | 0.1155 | 135 | 1.2265 | 7265576 | | 0.361 | 0.1198 | 140 | 1.2118 | 7535448 | | 0.2624 | 0.1241 | 145 | 1.2149 | 7809568 | | 0.3361 | 0.1283 | 150 | 1.2080 | 8075824 | | 0.2209 | 0.1326 | 155 | 1.2176 | 8344136 | | 0.3692 | 0.1369 | 160 | 1.2077 | 8617576 | | 0.3648 | 0.1412 | 165 | 1.2167 | 8896208 | | 0.3819 | 0.1454 | 170 | 1.1981 | 9168616 | | 0.3246 | 0.1497 | 175 | 1.2059 | 9439392 | | 0.2592 | 0.1540 | 180 | 1.2013 | 9712992 | | 0.2463 | 0.1583 | 185 | 1.2000 | 9970816 | | 0.1901 | 0.1625 | 190 | 1.1996 | 10238784 | | 0.2588 | 0.1668 | 195 | 1.1978 | 10513696 | | 0.346 | 0.1711 | 200 | 1.1957 | 10788672 | | 0.1714 | 0.1754 | 205 | 1.1987 | 11064928 | | 0.2532 | 0.1797 | 210 | 1.2013 | 11327736 | | 0.2951 | 0.1839 | 215 | 1.1940 | 11593984 | | 0.224 | 0.1882 | 220 | 1.2007 | 11870624 | | 0.1832 | 0.1925 | 225 | 1.1991 | 12144200 | | 0.3316 | 0.1968 | 230 | 1.1969 | 12410456 | | 0.2406 | 0.2010 | 235 | 1.1887 | 12682736 | | 0.1945 | 0.2053 | 240 | 1.1951 | 12948936 | | 0.2001 | 0.2096 | 245 | 1.1937 | 13220632 | | 0.2604 | 0.2139 | 250 | 1.1890 | 13495880 | | 0.2195 | 0.2182 | 255 | 1.1908 | 13768416 | | 0.2426 | 0.2224 | 260 | 1.1886 | 14038912 | | 0.2231 | 0.2267 | 265 | 1.1897 | 14303120 | | 0.215 | 0.2310 | 270 | 1.1830 | 14569728 | | 0.2297 | 0.2353 | 275 | 1.1879 | 14842848 | | 0.2042 | 0.2395 | 280 | 1.1844 | 15117944 | | 0.2103 | 0.2438 | 285 | 1.1818 | 15392440 | | 0.2358 | 0.2481 | 290 | 1.1812 | 15660888 | | 0.2139 | 0.2524 | 295 | 1.1770 | 15932928 | | 0.2129 | 0.2567 | 300 | 1.1832 | 16206296 | | 0.2495 | 0.2609 | 305 | 1.1813 | 16476064 | | 0.2447 | 0.2652 | 310 | 1.1746 | 16744344 | | 0.2493 | 0.2695 | 315 | 1.1787 | 17017328 | | 0.1736 | 0.2738 | 320 | 1.1757 | 17293648 | | 0.2021 | 0.2780 | 325 | 1.1751 | 17564352 | | 0.1906 | 0.2823 | 330 | 1.1791 | 17832488 | | 0.1566 | 0.2866 | 335 | 1.1729 | 18101936 | | 0.2381 | 0.2909 | 340 | 1.1767 | 18366272 | | 0.1651 | 0.2952 | 345 | 1.1728 | 18638096 | | 0.2087 | 0.2994 | 350 | 1.1715 | 18902976 | | 0.1556 | 0.3037 | 355 | 1.1758 | 19179072 | | 0.1836 | 0.3080 | 360 | 1.1743 | 19451392 | | 0.206 | 0.3123 | 365 | 1.1675 | 19719608 | | 0.1513 | 0.3165 | 370 | 1.1694 | 19993216 | | 0.1117 | 0.3208 | 375 | 1.1653 | 20262080 | | 0.1809 | 0.3251 | 380 | 1.1670 | 20529968 | | 0.1587 | 0.3294 | 385 | 1.1727 | 20797888 | | 0.2179 | 0.3337 | 390 | 1.1644 | 21063696 | | 0.1565 | 0.3379 | 395 | 1.1639 | 21340488 | | 0.1914 | 0.3422 | 400 | 1.1622 | 21610344 | | 0.189 | 0.3465 | 405 | 1.1608 | 21888272 | | 0.2155 | 0.3508 | 410 | 1.1624 | 22157912 | | 0.1637 | 0.3550 | 415 | 1.1615 | 22428144 | | 0.1893 | 0.3593 | 420 | 1.1611 | 22697424 | | 0.1579 | 0.3636 | 425 | 1.1582 | 22970232 | | 0.1733 | 0.3679 | 430 | 1.1619 | 23236448 | | 0.2003 | 0.3722 | 435 | 1.1568 | 23509592 | | 0.203 | 0.3764 | 440 | 1.1562 | 23781360 | | 0.2085 | 0.3807 | 445 | 1.1581 | 24053160 | | 0.2108 | 0.3850 | 450 | 1.1530 | 24327256 | | 0.1651 | 0.3893 | 455 | 1.1540 | 24591984 | | 0.1421 | 0.3935 | 460 | 1.1583 | 24864504 | | 0.1734 | 0.3978 | 465 | 1.1491 | 25138208 | | 0.247 | 0.4021 | 470 | 1.1512 | 25406984 | | 0.214 | 0.4064 | 475 | 1.1536 | 25672240 | | 0.2141 | 0.4107 | 480 | 1.1522 | 25938408 | | 0.1223 | 0.4149 | 485 | 1.1535 | 26207792 | | 0.1772 | 0.4192 | 490 | 1.1535 | 26472776 | | 0.2028 | 0.4235 | 495 | 1.1473 | 26747664 | | 0.1715 | 0.4278 | 500 | 1.1493 | 27015688 | | 0.2138 | 0.4320 | 505 | 1.1453 | 27278504 | | 0.1572 | 0.4363 | 510 | 1.1478 | 27547848 | | 0.1712 | 0.4406 | 515 | 1.1450 | 27809848 | | 0.213 | 0.4449 | 520 | 1.1468 | 28083624 | | 0.2085 | 0.4491 | 525 | 1.1469 | 28357112 | | 0.1312 | 0.4534 | 530 | 1.1428 | 28624624 | | 0.1982 | 0.4577 | 535 | 1.1426 | 28895280 | | 0.1566 | 0.4620 | 540 | 1.1468 | 29159584 | | 0.1547 | 0.4663 | 545 | 1.1453 | 29429200 | | 0.2244 | 0.4705 | 550 | 1.1428 | 29697536 | | 0.1952 | 0.4748 | 555 | 1.1441 | 29966616 | | 0.1646 | 0.4791 | 560 | 1.1420 | 30234376 | | 0.1243 | 0.4834 | 565 | 1.1418 | 30509392 | | 0.1995 | 0.4876 | 570 | 1.1419 | 30785368 | | 0.1989 | 0.4919 | 575 | 1.1398 | 31060456 | | 0.2007 | 0.4962 | 580 | 1.1386 | 31326208 | | 0.1472 | 0.5005 | 585 | 1.1393 | 31594472 | | 0.1106 | 0.5048 | 590 | 1.1399 | 31860304 | | 0.2542 | 0.5090 | 595 | 1.1378 | 32132960 | | 0.2023 | 0.5133 | 600 | 1.1358 | 32408064 | | 0.1613 | 0.5176 | 605 | 1.1389 | 32680560 | | 0.1493 | 0.5219 | 610 | 1.1369 | 32954248 | | 0.1255 | 0.5261 | 615 | 1.1378 | 33215640 | | 0.0936 | 0.5304 | 620 | 1.1401 | 33485632 | | 0.1824 | 0.5347 | 625 | 1.1382 | 33756656 | | 0.2243 | 0.5390 | 630 | 1.1390 | 34026464 | | 0.1573 | 0.5433 | 635 | 1.1361 | 34299816 | | 0.1638 | 0.5475 | 640 | 1.1352 | 34570872 | | 0.1157 | 0.5518 | 645 | 1.1360 | 34838312 | | 0.1701 | 0.5561 | 650 | 1.1342 | 35106056 | | 0.2314 | 0.5604 | 655 | 1.1337 | 35374072 | | 0.1754 | 0.5646 | 660 | 1.1351 | 35634464 | | 0.1703 | 0.5689 | 665 | 1.1320 | 35907424 | | 0.2359 | 0.5732 | 670 | 1.1314 | 36170096 | | 0.2349 | 0.5775 | 675 | 1.1329 | 36442024 | | 0.1305 | 0.5818 | 680 | 1.1308 | 36706288 | | 0.1876 | 0.5860 | 685 | 1.1312 | 36973688 | | 0.1347 | 0.5903 | 690 | 1.1320 | 37241296 | | 0.2262 | 0.5946 | 695 | 1.1314 | 37512872 | | 0.1998 | 0.5989 | 700 | 1.1326 | 37782680 | | 0.1055 | 0.6031 | 705 | 1.1304 | 38053608 | | 0.2393 | 0.6074 | 710 | 1.1302 | 38325008 | | 0.1775 | 0.6117 | 715 | 1.1307 | 38589416 | | 0.2197 | 0.6160 | 720 | 1.1277 | 38853576 | | 0.166 | 0.6203 | 725 | 1.1256 | 39122008 | | 0.1593 | 0.6245 | 730 | 1.1300 | 39396560 | | 0.1923 | 0.6288 | 735 | 1.1328 | 39666480 | | 0.1976 | 0.6331 | 740 | 1.1306 | 39934776 | | 0.1625 | 0.6374 | 745 | 1.1272 | 40198928 | | 0.1268 | 0.6416 | 750 | 1.1290 | 40474816 | | 0.219 | 0.6459 | 755 | 1.1289 | 40738928 | | 0.2275 | 0.6502 | 760 | 1.1235 | 41014112 | | 0.0704 | 0.6545 | 765 | 1.1265 | 41291400 | | 0.1353 | 0.6588 | 770 | 1.1284 | 41567064 | | 0.1344 | 0.6630 | 775 | 1.1257 | 41835856 | | 0.1868 | 0.6673 | 780 | 1.1241 | 42108416 | | 0.2027 | 0.6716 | 785 | 1.1269 | 42376552 | | 0.1119 | 0.6759 | 790 | 1.1281 | 42639272 | | 0.1379 | 0.6801 | 795 | 1.1261 | 42911096 | | 0.2652 | 0.6844 | 800 | 1.1265 | 43184912 | | 0.1232 | 0.6887 | 805 | 1.1253 | 43452840 | | 0.1459 | 0.6930 | 810 | 1.1239 | 43719024 | | 0.1376 | 0.6973 | 815 | 1.1257 | 43982968 | | 0.1484 | 0.7015 | 820 | 1.1273 | 44251808 | | 0.1617 | 0.7058 | 825 | 1.1248 | 44520088 | | 0.1703 | 0.7101 | 830 | 1.1240 | 44782312 | | 0.2121 | 0.7144 | 835 | 1.1246 | 45055208 | | 0.1987 | 0.7186 | 840 | 1.1221 | 45329256 | | 0.1687 | 0.7229 | 845 | 1.1218 | 45600800 | | 0.1417 | 0.7272 | 850 | 1.1245 | 45871688 | | 0.2093 | 0.7315 | 855 | 1.1243 | 46145112 | | 0.1644 | 0.7358 | 860 | 1.1260 | 46416248 | | 0.17 | 0.7400 | 865 | 1.1265 | 46685400 | | 0.197 | 0.7443 | 870 | 1.1215 | 46949488 | | 0.2171 | 0.7486 | 875 | 1.1240 | 47221208 | | 0.148 | 0.7529 | 880 | 1.1252 | 47503016 | | 0.1472 | 0.7571 | 885 | 1.1223 | 47771504 | | 0.0773 | 0.7614 | 890 | 1.1200 | 48043096 | | 0.1024 | 0.7657 | 895 | 1.1236 | 48310640 | | 0.0715 | 0.7700 | 900 | 1.1226 | 48579272 | | 0.161 | 0.7742 | 905 | 1.1208 | 48845664 | | 0.2209 | 0.7785 | 910 | 1.1225 | 49116328 | | 0.2193 | 0.7828 | 915 | 1.1227 | 49384192 | | 0.1065 | 0.7871 | 920 | 1.1213 | 49653128 | | 0.1488 | 0.7914 | 925 | 1.1221 | 49933168 | | 0.2447 | 0.7956 | 930 | 1.1200 | 50208440 | | 0.1157 | 0.7999 | 935 | 1.1216 | 50474600 | | 0.1756 | 0.8042 | 940 | 1.1227 | 50741896 | | 0.1873 | 0.8085 | 945 | 1.1186 | 51008128 | | 0.1736 | 0.8127 | 950 | 1.1199 | 51282936 | | 0.1495 | 0.8170 | 955 | 1.1226 | 51545616 | | 0.1663 | 0.8213 | 960 | 1.1194 | 51809832 | | 0.1343 | 0.8256 | 965 | 1.1184 | 52083672 | | 0.1252 | 0.8299 | 970 | 1.1195 | 52355144 | | 0.111 | 0.8341 | 975 | 1.1202 | 52630616 | | 0.1025 | 0.8384 | 980 | 1.1203 | 52908440 | | 0.1644 | 0.8427 | 985 | 1.1195 | 53182968 | | 0.1614 | 0.8470 | 990 | 1.1192 | 53448960 | | 0.1156 | 0.8512 | 995 | 1.1206 | 53722632 | | 0.1378 | 0.8555 | 1000 | 1.1192 | 53998512 | | 0.1776 | 0.8598 | 1005 | 1.1169 | 54263744 | | 0.2257 | 0.8641 | 1010 | 1.1174 | 54526592 | | 0.1631 | 0.8684 | 1015 | 1.1210 | 54792792 | | 0.1759 | 0.8726 | 1020 | 1.1169 | 55069680 | | 0.1197 | 0.8769 | 1025 | 1.1142 | 55350464 | | 0.1768 | 0.8812 | 1030 | 1.1170 | 55621960 | | 0.2284 | 0.8855 | 1035 | 1.1190 | 55896744 | | 0.1251 | 0.8897 | 1040 | 1.1156 | 56164720 | | 0.1812 | 0.8940 | 1045 | 1.1176 | 56434136 | | 0.234 | 0.8983 | 1050 | 1.1171 | 56709136 | | 0.1637 | 0.9026 | 1055 | 1.1145 | 56974616 | | 0.1279 | 0.9069 | 1060 | 1.1162 | 57242824 | | 0.1495 | 0.9111 | 1065 | 1.1177 | 57511368 | | 0.155 | 0.9154 | 1070 | 1.1181 | 57774344 | | 0.2235 | 0.9197 | 1075 | 1.1162 | 58043560 | | 0.126 | 0.9240 | 1080 | 1.1158 | 58312920 | | 0.1786 | 0.9282 | 1085 | 1.1173 | 58587160 | | 0.1193 | 0.9325 | 1090 | 1.1163 | 58858704 | | 0.1405 | 0.9368 | 1095 | 1.1142 | 59120792 | | 0.2019 | 0.9411 | 1100 | 1.1165 | 59388184 | | 0.2109 | 0.9454 | 1105 | 1.1159 | 59648456 | | 0.1786 | 0.9496 | 1110 | 1.1163 | 59925824 | | 0.1741 | 0.9539 | 1115 | 1.1162 | 60199640 | | 0.1791 | 0.9582 | 1120 | 1.1137 | 60469672 | | 0.1162 | 0.9625 | 1125 | 1.1154 | 60742672 | | 0.1385 | 0.9667 | 1130 | 1.1159 | 61012624 | | 0.1489 | 0.9710 | 1135 | 1.1142 | 61279728 | | 0.1068 | 0.9753 | 1140 | 1.1141 | 61546392 | | 0.1712 | 0.9796 | 1145 | 1.1140 | 61811624 | | 0.1502 | 0.9839 | 1150 | 1.1128 | 62076504 | | 0.1743 | 0.9881 | 1155 | 1.1140 | 62348416 | | 0.1894 | 0.9924 | 1160 | 1.1132 | 62611880 | | 0.1271 | 0.9967 | 1165 | 1.1129 | 62884000 | ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1