--- license: gemma base_model: google/gemma-2-2b tags: - trl - sft - generated_from_trainer model-index: - name: collapse_gemma-2-2b_hs2_accumulate_iter13_sftsd1 results: [] --- # collapse_gemma-2-2b_hs2_accumulate_iter13_sftsd1 This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 1.0988 - Num Input Tokens Seen: 66474032 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 8 - eval_batch_size: 16 - seed: 1 - gradient_accumulation_steps: 16 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |:-------------:|:------:|:----:|:---------------:|:-----------------:| | No log | 0 | 0 | 1.3909 | 0 | | 1.6496 | 0.0040 | 5 | 1.3888 | 269088 | | 1.593 | 0.0080 | 10 | 1.3738 | 532768 | | 1.6319 | 0.0121 | 15 | 1.3415 | 797616 | | 1.4412 | 0.0161 | 20 | 1.2878 | 1059528 | | 1.4212 | 0.0201 | 25 | 1.2452 | 1330056 | | 1.3105 | 0.0241 | 30 | 1.2106 | 1595496 | | 1.2004 | 0.0281 | 35 | 1.1870 | 1856096 | | 1.122 | 0.0321 | 40 | 1.1980 | 2128096 | | 0.9857 | 0.0362 | 45 | 1.2116 | 2397312 | | 0.8123 | 0.0402 | 50 | 1.2458 | 2660648 | | 0.6974 | 0.0442 | 55 | 1.2866 | 2923440 | | 0.5779 | 0.0482 | 60 | 1.2544 | 3190904 | | 0.6053 | 0.0522 | 65 | 1.2958 | 3466056 | | 0.377 | 0.0562 | 70 | 1.2836 | 3738872 | | 0.437 | 0.0603 | 75 | 1.2394 | 4008880 | | 0.2844 | 0.0643 | 80 | 1.2326 | 4270216 | | 0.2743 | 0.0683 | 85 | 1.2176 | 4534544 | | 0.2454 | 0.0723 | 90 | 1.2031 | 4797656 | | 0.3017 | 0.0763 | 95 | 1.2110 | 5064904 | | 0.2919 | 0.0804 | 100 | 1.1901 | 5325960 | | 0.2755 | 0.0844 | 105 | 1.1899 | 5588816 | | 0.2508 | 0.0884 | 110 | 1.1932 | 5859792 | | 0.2048 | 0.0924 | 115 | 1.1895 | 6124448 | | 0.1805 | 0.0964 | 120 | 1.1991 | 6394440 | | 0.2482 | 0.1004 | 125 | 1.1865 | 6660424 | | 0.2114 | 0.1045 | 130 | 1.1828 | 6925280 | | 0.2454 | 0.1085 | 135 | 1.1801 | 7192496 | | 0.2305 | 0.1125 | 140 | 1.1733 | 7456696 | | 0.1829 | 0.1165 | 145 | 1.1778 | 7723888 | | 0.2417 | 0.1205 | 150 | 1.1796 | 7998624 | | 0.1485 | 0.1245 | 155 | 1.1714 | 8271672 | | 0.1433 | 0.1286 | 160 | 1.1770 | 8546408 | | 0.2375 | 0.1326 | 165 | 1.1716 | 8816744 | | 0.1699 | 0.1366 | 170 | 1.1698 | 9086496 | | 0.1136 | 0.1406 | 175 | 1.1651 | 9346888 | | 0.1336 | 0.1446 | 180 | 1.1702 | 9619312 | | 0.1598 | 0.1487 | 185 | 1.1609 | 9885952 | | 0.0921 | 0.1527 | 190 | 1.1622 | 10153872 | | 0.2749 | 0.1567 | 195 | 1.1658 | 10421200 | | 0.2119 | 0.1607 | 200 | 1.1574 | 10694680 | | 0.2545 | 0.1647 | 205 | 1.1574 | 10966232 | | 0.242 | 0.1687 | 210 | 1.1530 | 11232608 | | 0.1785 | 0.1728 | 215 | 1.1555 | 11495504 | | 0.2243 | 0.1768 | 220 | 1.1555 | 11761088 | | 0.257 | 0.1808 | 225 | 1.1501 | 12034208 | | 0.1593 | 0.1848 | 230 | 1.1525 | 12297864 | | 0.2022 | 0.1888 | 235 | 1.1533 | 12565760 | | 0.2072 | 0.1928 | 240 | 1.1519 | 12833240 | | 0.1091 | 0.1969 | 245 | 1.1511 | 13102024 | | 0.0845 | 0.2009 | 250 | 1.1520 | 13362240 | | 0.2093 | 0.2049 | 255 | 1.1502 | 13625696 | | 0.1741 | 0.2089 | 260 | 1.1467 | 13890168 | | 0.1188 | 0.2129 | 265 | 1.1540 | 14154448 | | 0.3031 | 0.2170 | 270 | 1.1497 | 14419648 | | 0.1891 | 0.2210 | 275 | 1.1464 | 14674784 | | 0.2016 | 0.2250 | 280 | 1.1447 | 14949488 | | 0.1007 | 0.2290 | 285 | 1.1460 | 15214800 | | 0.1779 | 0.2330 | 290 | 1.1475 | 15483240 | | 0.195 | 0.2370 | 295 | 1.1398 | 15751536 | | 0.2069 | 0.2411 | 300 | 1.1429 | 16014584 | | 0.1597 | 0.2451 | 305 | 1.1420 | 16277072 | | 0.111 | 0.2491 | 310 | 1.1397 | 16540864 | | 0.107 | 0.2531 | 315 | 1.1423 | 16804568 | | 0.1212 | 0.2571 | 320 | 1.1387 | 17077128 | | 0.1412 | 0.2611 | 325 | 1.1382 | 17348320 | | 0.1192 | 0.2652 | 330 | 1.1419 | 17612576 | | 0.1879 | 0.2692 | 335 | 1.1388 | 17876784 | | 0.1433 | 0.2732 | 340 | 1.1362 | 18142544 | | 0.1748 | 0.2772 | 345 | 1.1411 | 18415672 | | 0.1677 | 0.2812 | 350 | 1.1373 | 18683536 | | 0.1358 | 0.2853 | 355 | 1.1346 | 18952888 | | 0.1712 | 0.2893 | 360 | 1.1369 | 19218360 | | 0.1619 | 0.2933 | 365 | 1.1386 | 19483840 | | 0.1071 | 0.2973 | 370 | 1.1347 | 19756976 | | 0.2192 | 0.3013 | 375 | 1.1322 | 20022776 | | 0.1235 | 0.3053 | 380 | 1.1334 | 20289712 | | 0.2287 | 0.3094 | 385 | 1.1345 | 20559104 | | 0.1922 | 0.3134 | 390 | 1.1295 | 20823864 | | 0.1379 | 0.3174 | 395 | 1.1306 | 21082544 | | 0.109 | 0.3214 | 400 | 1.1325 | 21356280 | | 0.1387 | 0.3254 | 405 | 1.1298 | 21630688 | | 0.1094 | 0.3294 | 410 | 1.1290 | 21895440 | | 0.1573 | 0.3335 | 415 | 1.1295 | 22163328 | | 0.1252 | 0.3375 | 420 | 1.1275 | 22422360 | | 0.1323 | 0.3415 | 425 | 1.1309 | 22693992 | | 0.1553 | 0.3455 | 430 | 1.1275 | 22960416 | | 0.0841 | 0.3495 | 435 | 1.1282 | 23224648 | | 0.1479 | 0.3536 | 440 | 1.1303 | 23485960 | | 0.1776 | 0.3576 | 445 | 1.1319 | 23757080 | | 0.1108 | 0.3616 | 450 | 1.1295 | 24019992 | | 0.1577 | 0.3656 | 455 | 1.1281 | 24283712 | | 0.1419 | 0.3696 | 460 | 1.1281 | 24555736 | | 0.1669 | 0.3736 | 465 | 1.1274 | 24819064 | | 0.175 | 0.3777 | 470 | 1.1248 | 25091464 | | 0.1287 | 0.3817 | 475 | 1.1257 | 25360944 | | 0.1303 | 0.3857 | 480 | 1.1300 | 25627840 | | 0.2149 | 0.3897 | 485 | 1.1238 | 25895920 | | 0.1754 | 0.3937 | 490 | 1.1214 | 26159488 | | 0.1381 | 0.3978 | 495 | 1.1240 | 26425400 | | 0.1971 | 0.4018 | 500 | 1.1243 | 26695288 | | 0.1112 | 0.4058 | 505 | 1.1231 | 26958128 | | 0.1507 | 0.4098 | 510 | 1.1190 | 27224768 | | 0.2245 | 0.4138 | 515 | 1.1196 | 27490376 | | 0.1332 | 0.4178 | 520 | 1.1214 | 27759472 | | 0.2522 | 0.4219 | 525 | 1.1237 | 28021432 | | 0.1485 | 0.4259 | 530 | 1.1195 | 28293960 | | 0.1108 | 0.4299 | 535 | 1.1196 | 28565520 | | 0.1354 | 0.4339 | 540 | 1.1205 | 28830248 | | 0.188 | 0.4379 | 545 | 1.1186 | 29098632 | | 0.1505 | 0.4419 | 550 | 1.1169 | 29366008 | | 0.2583 | 0.4460 | 555 | 1.1186 | 29631632 | | 0.1734 | 0.4500 | 560 | 1.1181 | 29892432 | | 0.1396 | 0.4540 | 565 | 1.1191 | 30155064 | | 0.147 | 0.4580 | 570 | 1.1185 | 30425328 | | 0.1781 | 0.4620 | 575 | 1.1157 | 30687912 | | 0.087 | 0.4661 | 580 | 1.1194 | 30955536 | | 0.1667 | 0.4701 | 585 | 1.1211 | 31223528 | | 0.2041 | 0.4741 | 590 | 1.1164 | 31486616 | | 0.1368 | 0.4781 | 595 | 1.1163 | 31756680 | | 0.1193 | 0.4821 | 600 | 1.1166 | 32029360 | | 0.1863 | 0.4861 | 605 | 1.1142 | 32300840 | | 0.1692 | 0.4902 | 610 | 1.1145 | 32559992 | | 0.1551 | 0.4942 | 615 | 1.1158 | 32820160 | | 0.1233 | 0.4982 | 620 | 1.1139 | 33090856 | | 0.2353 | 0.5022 | 625 | 1.1132 | 33356216 | | 0.0917 | 0.5062 | 630 | 1.1161 | 33627544 | | 0.1523 | 0.5102 | 635 | 1.1159 | 33898952 | | 0.1818 | 0.5143 | 640 | 1.1135 | 34166040 | | 0.0914 | 0.5183 | 645 | 1.1139 | 34432080 | | 0.1609 | 0.5223 | 650 | 1.1142 | 34695128 | | 0.1164 | 0.5263 | 655 | 1.1137 | 34960016 | | 0.1476 | 0.5303 | 660 | 1.1127 | 35227024 | | 0.1514 | 0.5344 | 665 | 1.1138 | 35502752 | | 0.1921 | 0.5384 | 670 | 1.1135 | 35777480 | | 0.1547 | 0.5424 | 675 | 1.1111 | 36051128 | | 0.1647 | 0.5464 | 680 | 1.1128 | 36324632 | | 0.1431 | 0.5504 | 685 | 1.1132 | 36599048 | | 0.1537 | 0.5544 | 690 | 1.1113 | 36868312 | | 0.1508 | 0.5585 | 695 | 1.1119 | 37137304 | | 0.1446 | 0.5625 | 700 | 1.1121 | 37400984 | | 0.1871 | 0.5665 | 705 | 1.1104 | 37670160 | | 0.1148 | 0.5705 | 710 | 1.1093 | 37937456 | | 0.1809 | 0.5745 | 715 | 1.1107 | 38213656 | | 0.1562 | 0.5785 | 720 | 1.1134 | 38481208 | | 0.1856 | 0.5826 | 725 | 1.1124 | 38748528 | | 0.2117 | 0.5866 | 730 | 1.1110 | 39014688 | | 0.1334 | 0.5906 | 735 | 1.1086 | 39285112 | | 0.1282 | 0.5946 | 740 | 1.1083 | 39558336 | | 0.1079 | 0.5986 | 745 | 1.1078 | 39816608 | | 0.2084 | 0.6027 | 750 | 1.1080 | 40081864 | | 0.1388 | 0.6067 | 755 | 1.1099 | 40349832 | | 0.1496 | 0.6107 | 760 | 1.1095 | 40617056 | | 0.123 | 0.6147 | 765 | 1.1066 | 40887032 | | 0.0792 | 0.6187 | 770 | 1.1065 | 41148104 | | 0.1639 | 0.6227 | 775 | 1.1086 | 41423424 | | 0.2501 | 0.6268 | 780 | 1.1078 | 41700288 | | 0.115 | 0.6308 | 785 | 1.1090 | 41971832 | | 0.1738 | 0.6348 | 790 | 1.1083 | 42239944 | | 0.1595 | 0.6388 | 795 | 1.1061 | 42497488 | | 0.1121 | 0.6428 | 800 | 1.1059 | 42763824 | | 0.1503 | 0.6468 | 805 | 1.1075 | 43033424 | | 0.0887 | 0.6509 | 810 | 1.1048 | 43299520 | | 0.1208 | 0.6549 | 815 | 1.1063 | 43567272 | | 0.1165 | 0.6589 | 820 | 1.1090 | 43830216 | | 0.136 | 0.6629 | 825 | 1.1080 | 44101312 | | 0.1441 | 0.6669 | 830 | 1.1059 | 44372208 | | 0.1372 | 0.6710 | 835 | 1.1074 | 44629960 | | 0.0905 | 0.6750 | 840 | 1.1078 | 44894304 | | 0.17 | 0.6790 | 845 | 1.1058 | 45163432 | | 0.1861 | 0.6830 | 850 | 1.1047 | 45430264 | | 0.1535 | 0.6870 | 855 | 1.1053 | 45705032 | | 0.2079 | 0.6910 | 860 | 1.1057 | 45973272 | | 0.1795 | 0.6951 | 865 | 1.1057 | 46238200 | | 0.1819 | 0.6991 | 870 | 1.1061 | 46508080 | | 0.1625 | 0.7031 | 875 | 1.1057 | 46775056 | | 0.157 | 0.7071 | 880 | 1.1041 | 47045584 | | 0.1586 | 0.7111 | 885 | 1.1041 | 47315400 | | 0.1219 | 0.7151 | 890 | 1.1043 | 47581088 | | 0.1534 | 0.7192 | 895 | 1.1045 | 47844512 | | 0.1423 | 0.7232 | 900 | 1.1032 | 48114328 | | 0.1358 | 0.7272 | 905 | 1.1040 | 48380520 | | 0.127 | 0.7312 | 910 | 1.1042 | 48649872 | | 0.1462 | 0.7352 | 915 | 1.1043 | 48920232 | | 0.154 | 0.7393 | 920 | 1.1035 | 49186984 | | 0.1847 | 0.7433 | 925 | 1.1041 | 49454928 | | 0.1678 | 0.7473 | 930 | 1.1053 | 49722280 | | 0.1658 | 0.7513 | 935 | 1.1050 | 49988024 | | 0.1301 | 0.7553 | 940 | 1.1053 | 50255760 | | 0.1239 | 0.7593 | 945 | 1.1044 | 50530080 | | 0.1458 | 0.7634 | 950 | 1.1037 | 50792368 | | 0.152 | 0.7674 | 955 | 1.1041 | 51052328 | | 0.1736 | 0.7714 | 960 | 1.1041 | 51318808 | | 0.1981 | 0.7754 | 965 | 1.1030 | 51586904 | | 0.1032 | 0.7794 | 970 | 1.1021 | 51861168 | | 0.1126 | 0.7834 | 975 | 1.1050 | 52129208 | | 0.2006 | 0.7875 | 980 | 1.1045 | 52395312 | | 0.2615 | 0.7915 | 985 | 1.1011 | 52661168 | | 0.1574 | 0.7955 | 990 | 1.1013 | 52923160 | | 0.183 | 0.7995 | 995 | 1.1067 | 53179296 | | 0.1247 | 0.8035 | 1000 | 1.1045 | 53445496 | | 0.136 | 0.8076 | 1005 | 1.1013 | 53714992 | | 0.2123 | 0.8116 | 1010 | 1.1015 | 53973440 | | 0.1449 | 0.8156 | 1015 | 1.1025 | 54238472 | | 0.2289 | 0.8196 | 1020 | 1.1019 | 54508944 | | 0.1454 | 0.8236 | 1025 | 1.1013 | 54782640 | | 0.1422 | 0.8276 | 1030 | 1.1022 | 55052512 | | 0.1588 | 0.8317 | 1035 | 1.1022 | 55320536 | | 0.1174 | 0.8357 | 1040 | 1.1024 | 55587976 | | 0.1778 | 0.8397 | 1045 | 1.1006 | 55850544 | | 0.2064 | 0.8437 | 1050 | 1.1019 | 56111488 | | 0.1348 | 0.8477 | 1055 | 1.1043 | 56379936 | | 0.1454 | 0.8517 | 1060 | 1.1027 | 56633752 | | 0.0895 | 0.8558 | 1065 | 1.0997 | 56900624 | | 0.1199 | 0.8598 | 1070 | 1.1008 | 57165704 | | 0.1866 | 0.8638 | 1075 | 1.1013 | 57431640 | | 0.1512 | 0.8678 | 1080 | 1.1002 | 57697040 | | 0.1935 | 0.8718 | 1085 | 1.1003 | 57971200 | | 0.1479 | 0.8759 | 1090 | 1.1003 | 58235216 | | 0.1603 | 0.8799 | 1095 | 1.1010 | 58505320 | | 0.1545 | 0.8839 | 1100 | 1.1004 | 58781952 | | 0.1349 | 0.8879 | 1105 | 1.0978 | 59054312 | | 0.1038 | 0.8919 | 1110 | 1.0981 | 59316192 | | 0.2127 | 0.8959 | 1115 | 1.0985 | 59576760 | | 0.2207 | 0.9000 | 1120 | 1.0978 | 59841800 | | 0.1447 | 0.9040 | 1125 | 1.0980 | 60108152 | | 0.1445 | 0.9080 | 1130 | 1.0986 | 60381688 | | 0.123 | 0.9120 | 1135 | 1.0985 | 60644416 | | 0.1337 | 0.9160 | 1140 | 1.0972 | 60914960 | | 0.1519 | 0.9200 | 1145 | 1.0964 | 61189320 | | 0.1618 | 0.9241 | 1150 | 1.0997 | 61451944 | | 0.1586 | 0.9281 | 1155 | 1.1000 | 61715960 | | 0.1538 | 0.9321 | 1160 | 1.0981 | 61986840 | | 0.0929 | 0.9361 | 1165 | 1.0972 | 62255312 | | 0.1543 | 0.9401 | 1170 | 1.0973 | 62523592 | | 0.1406 | 0.9442 | 1175 | 1.0976 | 62795320 | | 0.1527 | 0.9482 | 1180 | 1.0970 | 63061184 | | 0.1556 | 0.9522 | 1185 | 1.0975 | 63326856 | | 0.2417 | 0.9562 | 1190 | 1.0983 | 63598528 | | 0.1064 | 0.9602 | 1195 | 1.1001 | 63861592 | | 0.1908 | 0.9642 | 1200 | 1.0971 | 64129760 | | 0.1303 | 0.9683 | 1205 | 1.0958 | 64399112 | | 0.1397 | 0.9723 | 1210 | 1.0972 | 64666312 | | 0.1802 | 0.9763 | 1215 | 1.0971 | 64938056 | | 0.1478 | 0.9803 | 1220 | 1.0970 | 65198400 | | 0.1511 | 0.9843 | 1225 | 1.0966 | 65460480 | | 0.1352 | 0.9883 | 1230 | 1.0973 | 65730520 | | 0.1681 | 0.9924 | 1235 | 1.0983 | 65993712 | | 0.1158 | 0.9964 | 1240 | 1.0982 | 66264848 | ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1