--- license: gemma base_model: google/gemma-2-2b tags: - trl - sft - generated_from_trainer model-index: - name: collapse_gemma-2-2b_hs2_accumulate_iter10_sftsd2 results: [] --- # collapse_gemma-2-2b_hs2_accumulate_iter10_sftsd2 This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 1.1062 - Num Input Tokens Seen: 79934792 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 8 - eval_batch_size: 16 - seed: 2 - gradient_accumulation_steps: 16 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |:-------------:|:------:|:----:|:---------------:|:-----------------:| | No log | 0 | 0 | 1.3956 | 0 | | 1.7516 | 0.0034 | 5 | 1.3951 | 270272 | | 1.6881 | 0.0067 | 10 | 1.3852 | 541984 | | 1.7126 | 0.0101 | 15 | 1.3562 | 811320 | | 1.605 | 0.0134 | 20 | 1.3144 | 1078944 | | 1.5546 | 0.0168 | 25 | 1.2661 | 1338944 | | 1.395 | 0.0201 | 30 | 1.2310 | 1604936 | | 1.3409 | 0.0235 | 35 | 1.1967 | 1877640 | | 1.257 | 0.0268 | 40 | 1.1816 | 2139560 | | 1.1392 | 0.0302 | 45 | 1.1869 | 2412232 | | 1.0556 | 0.0335 | 50 | 1.1939 | 2687216 | | 0.9911 | 0.0369 | 55 | 1.2188 | 2952704 | | 0.8439 | 0.0402 | 60 | 1.2330 | 3219848 | | 0.7578 | 0.0436 | 65 | 1.2815 | 3486600 | | 0.6959 | 0.0469 | 70 | 1.2725 | 3761912 | | 0.6015 | 0.0503 | 75 | 1.3182 | 4023352 | | 0.4577 | 0.0536 | 80 | 1.3007 | 4291488 | | 0.3911 | 0.0570 | 85 | 1.3014 | 4552552 | | 0.3724 | 0.0603 | 90 | 1.2987 | 4821400 | | 0.3798 | 0.0637 | 95 | 1.2735 | 5090832 | | 0.3009 | 0.0670 | 100 | 1.2615 | 5360992 | | 0.2897 | 0.0704 | 105 | 1.2401 | 5622952 | | 0.3031 | 0.0737 | 110 | 1.2327 | 5893760 | | 0.267 | 0.0771 | 115 | 1.2390 | 6160416 | | 0.1885 | 0.0804 | 120 | 1.2295 | 6420392 | | 0.2927 | 0.0838 | 125 | 1.2368 | 6687048 | | 0.2723 | 0.0871 | 130 | 1.2207 | 6953928 | | 0.1628 | 0.0905 | 135 | 1.2179 | 7217136 | | 0.2065 | 0.0938 | 140 | 1.2198 | 7492048 | | 0.2944 | 0.0972 | 145 | 1.2066 | 7765392 | | 0.2223 | 0.1005 | 150 | 1.2093 | 8027928 | | 0.2249 | 0.1039 | 155 | 1.2120 | 8292184 | | 0.1929 | 0.1072 | 160 | 1.2120 | 8562544 | | 0.1456 | 0.1106 | 165 | 1.2061 | 8827072 | | 0.2167 | 0.1139 | 170 | 1.2038 | 9092776 | | 0.1924 | 0.1173 | 175 | 1.2042 | 9360936 | | 0.1899 | 0.1206 | 180 | 1.2034 | 9633344 | | 0.2269 | 0.1240 | 185 | 1.2013 | 9901560 | | 0.2109 | 0.1273 | 190 | 1.1982 | 10165152 | | 0.2176 | 0.1307 | 195 | 1.1943 | 10436048 | | 0.2036 | 0.1340 | 200 | 1.2011 | 10710296 | | 0.2032 | 0.1374 | 205 | 1.1929 | 10976448 | | 0.2137 | 0.1408 | 210 | 1.1878 | 11244528 | | 0.2022 | 0.1441 | 215 | 1.2060 | 11514456 | | 0.1817 | 0.1475 | 220 | 1.1899 | 11786632 | | 0.1818 | 0.1508 | 225 | 1.1885 | 12057624 | | 0.207 | 0.1542 | 230 | 1.1934 | 12322552 | | 0.1666 | 0.1575 | 235 | 1.1850 | 12585856 | | 0.2202 | 0.1609 | 240 | 1.1893 | 12852128 | | 0.1047 | 0.1642 | 245 | 1.1861 | 13110664 | | 0.1657 | 0.1676 | 250 | 1.1846 | 13377760 | | 0.1537 | 0.1709 | 255 | 1.1825 | 13648656 | | 0.2295 | 0.1743 | 260 | 1.1760 | 13914640 | | 0.2343 | 0.1776 | 265 | 1.1789 | 14184224 | | 0.203 | 0.1810 | 270 | 1.1754 | 14456352 | | 0.224 | 0.1843 | 275 | 1.1786 | 14717864 | | 0.2238 | 0.1877 | 280 | 1.1827 | 14983528 | | 0.1292 | 0.1910 | 285 | 1.1735 | 15255928 | | 0.1863 | 0.1944 | 290 | 1.1803 | 15533208 | | 0.2342 | 0.1977 | 295 | 1.1769 | 15802280 | | 0.1469 | 0.2011 | 300 | 1.1709 | 16066840 | | 0.2217 | 0.2044 | 305 | 1.1773 | 16331440 | | 0.1216 | 0.2078 | 310 | 1.1730 | 16602072 | | 0.141 | 0.2111 | 315 | 1.1691 | 16868128 | | 0.2002 | 0.2145 | 320 | 1.1731 | 17144752 | | 0.1174 | 0.2178 | 325 | 1.1697 | 17414344 | | 0.1889 | 0.2212 | 330 | 1.1723 | 17684576 | | 0.0913 | 0.2245 | 335 | 1.1702 | 17951304 | | 0.214 | 0.2279 | 340 | 1.1651 | 18220032 | | 0.1673 | 0.2312 | 345 | 1.1696 | 18492888 | | 0.1793 | 0.2346 | 350 | 1.1636 | 18760640 | | 0.1596 | 0.2379 | 355 | 1.1649 | 19030104 | | 0.1975 | 0.2413 | 360 | 1.1595 | 19307320 | | 0.1438 | 0.2446 | 365 | 1.1595 | 19569584 | | 0.138 | 0.2480 | 370 | 1.1639 | 19840360 | | 0.1538 | 0.2513 | 375 | 1.1610 | 20110384 | | 0.1479 | 0.2547 | 380 | 1.1592 | 20376368 | | 0.1358 | 0.2580 | 385 | 1.1668 | 20643072 | | 0.1557 | 0.2614 | 390 | 1.1628 | 20907272 | | 0.111 | 0.2647 | 395 | 1.1578 | 21180016 | | 0.1675 | 0.2681 | 400 | 1.1590 | 21456112 | | 0.1786 | 0.2714 | 405 | 1.1598 | 21715968 | | 0.1802 | 0.2748 | 410 | 1.1557 | 21988960 | | 0.1736 | 0.2782 | 415 | 1.1550 | 22258920 | | 0.0973 | 0.2815 | 420 | 1.1579 | 22525336 | | 0.1362 | 0.2849 | 425 | 1.1544 | 22795792 | | 0.0962 | 0.2882 | 430 | 1.1570 | 23061320 | | 0.1419 | 0.2916 | 435 | 1.1538 | 23323392 | | 0.1029 | 0.2949 | 440 | 1.1519 | 23586896 | | 0.1208 | 0.2983 | 445 | 1.1582 | 23855528 | | 0.1759 | 0.3016 | 450 | 1.1539 | 24125992 | | 0.1998 | 0.3050 | 455 | 1.1537 | 24395848 | | 0.1461 | 0.3083 | 460 | 1.1540 | 24657512 | | 0.1449 | 0.3117 | 465 | 1.1544 | 24930056 | | 0.1348 | 0.3150 | 470 | 1.1521 | 25205504 | | 0.1089 | 0.3184 | 475 | 1.1524 | 25476824 | | 0.1291 | 0.3217 | 480 | 1.1505 | 25753432 | | 0.1081 | 0.3251 | 485 | 1.1483 | 26016568 | | 0.0861 | 0.3284 | 490 | 1.1506 | 26287432 | | 0.0865 | 0.3318 | 495 | 1.1503 | 26555008 | | 0.1932 | 0.3351 | 500 | 1.1480 | 26825528 | | 0.1291 | 0.3385 | 505 | 1.1502 | 27095384 | | 0.1115 | 0.3418 | 510 | 1.1490 | 27358264 | | 0.1187 | 0.3452 | 515 | 1.1461 | 27629248 | | 0.1786 | 0.3485 | 520 | 1.1457 | 27898328 | | 0.1981 | 0.3519 | 525 | 1.1444 | 28168080 | | 0.0757 | 0.3552 | 530 | 1.1424 | 28440184 | | 0.1238 | 0.3586 | 535 | 1.1430 | 28703544 | | 0.1891 | 0.3619 | 540 | 1.1450 | 28975592 | | 0.1439 | 0.3653 | 545 | 1.1415 | 29241288 | | 0.1241 | 0.3686 | 550 | 1.1400 | 29510576 | | 0.1831 | 0.3720 | 555 | 1.1479 | 29781256 | | 0.1603 | 0.3753 | 560 | 1.1442 | 30044872 | | 0.1531 | 0.3787 | 565 | 1.1377 | 30316576 | | 0.175 | 0.3820 | 570 | 1.1398 | 30584056 | | 0.1243 | 0.3854 | 575 | 1.1465 | 30849976 | | 0.1712 | 0.3887 | 580 | 1.1401 | 31116344 | | 0.1897 | 0.3921 | 585 | 1.1377 | 31380224 | | 0.103 | 0.3954 | 590 | 1.1433 | 31644296 | | 0.1598 | 0.3988 | 595 | 1.1441 | 31910880 | | 0.0935 | 0.4021 | 600 | 1.1403 | 32180992 | | 0.149 | 0.4055 | 605 | 1.1398 | 32450976 | | 0.1138 | 0.4088 | 610 | 1.1395 | 32722640 | | 0.0941 | 0.4122 | 615 | 1.1385 | 32992768 | | 0.1376 | 0.4155 | 620 | 1.1382 | 33253680 | | 0.1818 | 0.4189 | 625 | 1.1366 | 33522208 | | 0.1314 | 0.4223 | 630 | 1.1411 | 33790624 | | 0.143 | 0.4256 | 635 | 1.1428 | 34065440 | | 0.1482 | 0.4290 | 640 | 1.1407 | 34330152 | | 0.0906 | 0.4323 | 645 | 1.1402 | 34598840 | | 0.1593 | 0.4357 | 650 | 1.1370 | 34866552 | | 0.1785 | 0.4390 | 655 | 1.1354 | 35140528 | | 0.1804 | 0.4424 | 660 | 1.1346 | 35405720 | | 0.1543 | 0.4457 | 665 | 1.1343 | 35675432 | | 0.1364 | 0.4491 | 670 | 1.1346 | 35940744 | | 0.1464 | 0.4524 | 675 | 1.1355 | 36211864 | | 0.1241 | 0.4558 | 680 | 1.1349 | 36476808 | | 0.0951 | 0.4591 | 685 | 1.1358 | 36744672 | | 0.1708 | 0.4625 | 690 | 1.1336 | 37010368 | | 0.1309 | 0.4658 | 695 | 1.1330 | 37280120 | | 0.1281 | 0.4692 | 700 | 1.1354 | 37545376 | | 0.0862 | 0.4725 | 705 | 1.1368 | 37814296 | | 0.1468 | 0.4759 | 710 | 1.1352 | 38083688 | | 0.1626 | 0.4792 | 715 | 1.1322 | 38355008 | | 0.1875 | 0.4826 | 720 | 1.1290 | 38621384 | | 0.2071 | 0.4859 | 725 | 1.1291 | 38893296 | | 0.1727 | 0.4893 | 730 | 1.1331 | 39169168 | | 0.1585 | 0.4926 | 735 | 1.1320 | 39439888 | | 0.1277 | 0.4960 | 740 | 1.1295 | 39708088 | | 0.1722 | 0.4993 | 745 | 1.1305 | 39972080 | | 0.1461 | 0.5027 | 750 | 1.1309 | 40239712 | | 0.1701 | 0.5060 | 755 | 1.1315 | 40504272 | | 0.1307 | 0.5094 | 760 | 1.1301 | 40764944 | | 0.121 | 0.5127 | 765 | 1.1276 | 41038032 | | 0.16 | 0.5161 | 770 | 1.1292 | 41306160 | | 0.1852 | 0.5194 | 775 | 1.1274 | 41579224 | | 0.1913 | 0.5228 | 780 | 1.1295 | 41848968 | | 0.1181 | 0.5261 | 785 | 1.1306 | 42114688 | | 0.0728 | 0.5295 | 790 | 1.1271 | 42387872 | | 0.1058 | 0.5328 | 795 | 1.1254 | 42654328 | | 0.1164 | 0.5362 | 800 | 1.1276 | 42920744 | | 0.1113 | 0.5395 | 805 | 1.1273 | 43187368 | | 0.1454 | 0.5429 | 810 | 1.1261 | 43450184 | | 0.2061 | 0.5462 | 815 | 1.1254 | 43722824 | | 0.1736 | 0.5496 | 820 | 1.1248 | 43989960 | | 0.2034 | 0.5529 | 825 | 1.1254 | 44259280 | | 0.1672 | 0.5563 | 830 | 1.1265 | 44530192 | | 0.1496 | 0.5597 | 835 | 1.1248 | 44789728 | | 0.1813 | 0.5630 | 840 | 1.1260 | 45057424 | | 0.1334 | 0.5664 | 845 | 1.1252 | 45331952 | | 0.1284 | 0.5697 | 850 | 1.1225 | 45590128 | | 0.1436 | 0.5731 | 855 | 1.1250 | 45863016 | | 0.0975 | 0.5764 | 860 | 1.1265 | 46131064 | | 0.1654 | 0.5798 | 865 | 1.1237 | 46397936 | | 0.1752 | 0.5831 | 870 | 1.1238 | 46670968 | | 0.1714 | 0.5865 | 875 | 1.1266 | 46929320 | | 0.1759 | 0.5898 | 880 | 1.1235 | 47198392 | | 0.204 | 0.5932 | 885 | 1.1227 | 47463024 | | 0.1457 | 0.5965 | 890 | 1.1232 | 47731832 | | 0.1134 | 0.5999 | 895 | 1.1233 | 47996312 | | 0.1137 | 0.6032 | 900 | 1.1233 | 48269520 | | 0.1665 | 0.6066 | 905 | 1.1231 | 48540448 | | 0.2046 | 0.6099 | 910 | 1.1223 | 48809000 | | 0.1132 | 0.6133 | 915 | 1.1230 | 49071056 | | 0.139 | 0.6166 | 920 | 1.1239 | 49338560 | | 0.1287 | 0.6200 | 925 | 1.1192 | 49606800 | | 0.2251 | 0.6233 | 930 | 1.1193 | 49874384 | | 0.101 | 0.6267 | 935 | 1.1220 | 50142344 | | 0.1121 | 0.6300 | 940 | 1.1198 | 50402984 | | 0.1112 | 0.6334 | 945 | 1.1214 | 50671928 | | 0.1593 | 0.6367 | 950 | 1.1214 | 50934832 | | 0.1577 | 0.6401 | 955 | 1.1190 | 51205280 | | 0.1041 | 0.6434 | 960 | 1.1214 | 51473608 | | 0.1846 | 0.6468 | 965 | 1.1203 | 51737904 | | 0.1015 | 0.6501 | 970 | 1.1193 | 52006464 | | 0.146 | 0.6535 | 975 | 1.1209 | 52267624 | | 0.1345 | 0.6568 | 980 | 1.1206 | 52531304 | | 0.0711 | 0.6602 | 985 | 1.1201 | 52793224 | | 0.1453 | 0.6635 | 990 | 1.1193 | 53058696 | | 0.1583 | 0.6669 | 995 | 1.1180 | 53320480 | | 0.1729 | 0.6702 | 1000 | 1.1202 | 53585272 | | 0.1337 | 0.6736 | 1005 | 1.1196 | 53845488 | | 0.1435 | 0.6769 | 1010 | 1.1184 | 54113488 | | 0.159 | 0.6803 | 1015 | 1.1175 | 54383920 | | 0.0812 | 0.6836 | 1020 | 1.1195 | 54654016 | | 0.1215 | 0.6870 | 1025 | 1.1188 | 54918328 | | 0.1043 | 0.6903 | 1030 | 1.1208 | 55187872 | | 0.1295 | 0.6937 | 1035 | 1.1199 | 55453544 | | 0.1549 | 0.6971 | 1040 | 1.1181 | 55723880 | | 0.0889 | 0.7004 | 1045 | 1.1177 | 55989552 | | 0.2117 | 0.7038 | 1050 | 1.1188 | 56259544 | | 0.1109 | 0.7071 | 1055 | 1.1190 | 56523424 | | 0.192 | 0.7105 | 1060 | 1.1167 | 56795104 | | 0.1341 | 0.7138 | 1065 | 1.1158 | 57060504 | | 0.1371 | 0.7172 | 1070 | 1.1197 | 57323624 | | 0.1697 | 0.7205 | 1075 | 1.1192 | 57594960 | | 0.1499 | 0.7239 | 1080 | 1.1152 | 57855808 | | 0.1167 | 0.7272 | 1085 | 1.1166 | 58120184 | | 0.1635 | 0.7306 | 1090 | 1.1182 | 58392112 | | 0.1055 | 0.7339 | 1095 | 1.1175 | 58651496 | | 0.1141 | 0.7373 | 1100 | 1.1180 | 58919248 | | 0.1839 | 0.7406 | 1105 | 1.1175 | 59191368 | | 0.1006 | 0.7440 | 1110 | 1.1165 | 59459688 | | 0.0973 | 0.7473 | 1115 | 1.1189 | 59730256 | | 0.0743 | 0.7507 | 1120 | 1.1202 | 59992008 | | 0.1288 | 0.7540 | 1125 | 1.1167 | 60252768 | | 0.1671 | 0.7574 | 1130 | 1.1146 | 60520584 | | 0.1495 | 0.7607 | 1135 | 1.1154 | 60790640 | | 0.107 | 0.7641 | 1140 | 1.1153 | 61055816 | | 0.1913 | 0.7674 | 1145 | 1.1154 | 61313816 | | 0.1957 | 0.7708 | 1150 | 1.1171 | 61582952 | | 0.1617 | 0.7741 | 1155 | 1.1167 | 61857360 | | 0.1426 | 0.7775 | 1160 | 1.1145 | 62129144 | | 0.146 | 0.7808 | 1165 | 1.1133 | 62394176 | | 0.1061 | 0.7842 | 1170 | 1.1158 | 62662272 | | 0.1836 | 0.7875 | 1175 | 1.1186 | 62936744 | | 0.2102 | 0.7909 | 1180 | 1.1140 | 63207800 | | 0.1363 | 0.7942 | 1185 | 1.1130 | 63476664 | | 0.1142 | 0.7976 | 1190 | 1.1170 | 63744256 | | 0.1063 | 0.8009 | 1195 | 1.1149 | 64015136 | | 0.1651 | 0.8043 | 1200 | 1.1113 | 64288704 | | 0.1959 | 0.8076 | 1205 | 1.1129 | 64553904 | | 0.1225 | 0.8110 | 1210 | 1.1127 | 64826656 | | 0.1741 | 0.8143 | 1215 | 1.1129 | 65094592 | | 0.146 | 0.8177 | 1220 | 1.1133 | 65369856 | | 0.1116 | 0.8210 | 1225 | 1.1119 | 65637136 | | 0.1747 | 0.8244 | 1230 | 1.1099 | 65903504 | | 0.1407 | 0.8277 | 1235 | 1.1127 | 66157928 | | 0.1356 | 0.8311 | 1240 | 1.1125 | 66425296 | | 0.1248 | 0.8345 | 1245 | 1.1106 | 66694200 | | 0.1102 | 0.8378 | 1250 | 1.1119 | 66961800 | | 0.1173 | 0.8412 | 1255 | 1.1114 | 67231456 | | 0.1191 | 0.8445 | 1260 | 1.1093 | 67497624 | | 0.0908 | 0.8479 | 1265 | 1.1109 | 67766744 | | 0.0833 | 0.8512 | 1270 | 1.1114 | 68031568 | | 0.1309 | 0.8546 | 1275 | 1.1111 | 68300824 | | 0.1777 | 0.8579 | 1280 | 1.1106 | 68569472 | | 0.1389 | 0.8613 | 1285 | 1.1110 | 68835464 | | 0.1253 | 0.8646 | 1290 | 1.1117 | 69100104 | | 0.0803 | 0.8680 | 1295 | 1.1111 | 69372216 | | 0.1408 | 0.8713 | 1300 | 1.1109 | 69636520 | | 0.1631 | 0.8747 | 1305 | 1.1114 | 69894608 | | 0.17 | 0.8780 | 1310 | 1.1116 | 70161624 | | 0.1352 | 0.8814 | 1315 | 1.1125 | 70421752 | | 0.1529 | 0.8847 | 1320 | 1.1136 | 70688696 | | 0.0832 | 0.8881 | 1325 | 1.1110 | 70951192 | | 0.1542 | 0.8914 | 1330 | 1.1088 | 71222600 | | 0.1263 | 0.8948 | 1335 | 1.1082 | 71494272 | | 0.1641 | 0.8981 | 1340 | 1.1095 | 71765240 | | 0.1097 | 0.9015 | 1345 | 1.1107 | 72035120 | | 0.1506 | 0.9048 | 1350 | 1.1085 | 72304632 | | 0.1136 | 0.9082 | 1355 | 1.1065 | 72571936 | | 0.1625 | 0.9115 | 1360 | 1.1068 | 72844280 | | 0.1349 | 0.9149 | 1365 | 1.1083 | 73107600 | | 0.139 | 0.9182 | 1370 | 1.1097 | 73375912 | | 0.1872 | 0.9216 | 1375 | 1.1096 | 73651128 | | 0.088 | 0.9249 | 1380 | 1.1088 | 73924384 | | 0.0991 | 0.9283 | 1385 | 1.1088 | 74191232 | | 0.1135 | 0.9316 | 1390 | 1.1086 | 74462488 | | 0.1887 | 0.9350 | 1395 | 1.1089 | 74730824 | | 0.1564 | 0.9383 | 1400 | 1.1088 | 75000416 | | 0.1177 | 0.9417 | 1405 | 1.1084 | 75271672 | | 0.1479 | 0.9450 | 1410 | 1.1077 | 75539632 | | 0.1473 | 0.9484 | 1415 | 1.1095 | 75810152 | | 0.1479 | 0.9517 | 1420 | 1.1089 | 76078600 | | 0.0881 | 0.9551 | 1425 | 1.1074 | 76342216 | | 0.1072 | 0.9584 | 1430 | 1.1085 | 76601200 | | 0.1121 | 0.9618 | 1435 | 1.1111 | 76868160 | | 0.1739 | 0.9651 | 1440 | 1.1101 | 77140680 | | 0.0968 | 0.9685 | 1445 | 1.1073 | 77414736 | | 0.1518 | 0.9718 | 1450 | 1.1070 | 77689648 | | 0.1003 | 0.9752 | 1455 | 1.1066 | 77952712 | | 0.1699 | 0.9786 | 1460 | 1.1060 | 78224928 | | 0.1503 | 0.9819 | 1465 | 1.1052 | 78498536 | | 0.1964 | 0.9853 | 1470 | 1.1069 | 78765120 | | 0.1299 | 0.9886 | 1475 | 1.1097 | 79032872 | | 0.1241 | 0.9920 | 1480 | 1.1092 | 79301528 | | 0.1127 | 0.9953 | 1485 | 1.1062 | 79565296 | | 0.1648 | 0.9987 | 1490 | 1.1053 | 79830192 | ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1