--- license: gemma base_model: google/gemma-2-2b tags: - trl - sft - generated_from_trainer model-index: - name: collapse_gemma-2-2b_hs2_accumulate_iter13_sftsd2 results: [] --- # collapse_gemma-2-2b_hs2_accumulate_iter13_sftsd2 This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 1.1055 - Num Input Tokens Seen: 67259368 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 8 - eval_batch_size: 16 - seed: 2 - gradient_accumulation_steps: 16 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |:-------------:|:------:|:----:|:---------------:|:-----------------:| | No log | 0 | 0 | 1.3909 | 0 | | 1.5671 | 0.0040 | 5 | 1.3893 | 272824 | | 1.6849 | 0.0080 | 10 | 1.3735 | 539200 | | 1.5548 | 0.0121 | 15 | 1.3410 | 813336 | | 1.5425 | 0.0161 | 20 | 1.2875 | 1083664 | | 1.4656 | 0.0201 | 25 | 1.2496 | 1352864 | | 1.3104 | 0.0241 | 30 | 1.2188 | 1627144 | | 1.2102 | 0.0281 | 35 | 1.1918 | 1903616 | | 1.0365 | 0.0322 | 40 | 1.2027 | 2179440 | | 0.9723 | 0.0362 | 45 | 1.2295 | 2452416 | | 0.8749 | 0.0402 | 50 | 1.2301 | 2723624 | | 0.7583 | 0.0442 | 55 | 1.2641 | 2995552 | | 0.6034 | 0.0482 | 60 | 1.2442 | 3263112 | | 0.6268 | 0.0523 | 65 | 1.2775 | 3532048 | | 0.5245 | 0.0563 | 70 | 1.2555 | 3803712 | | 0.5529 | 0.0603 | 75 | 1.2545 | 4067952 | | 0.451 | 0.0643 | 80 | 1.2311 | 4341488 | | 0.3354 | 0.0683 | 85 | 1.2287 | 4606360 | | 0.3631 | 0.0724 | 90 | 1.2392 | 4870680 | | 0.3391 | 0.0764 | 95 | 1.2107 | 5141880 | | 0.3349 | 0.0804 | 100 | 1.2144 | 5419064 | | 0.3393 | 0.0844 | 105 | 1.2027 | 5684304 | | 0.3478 | 0.0885 | 110 | 1.2009 | 5960032 | | 0.3301 | 0.0925 | 115 | 1.2019 | 6233704 | | 0.3595 | 0.0965 | 120 | 1.1983 | 6513328 | | 0.2708 | 0.1005 | 125 | 1.1992 | 6787368 | | 0.3066 | 0.1045 | 130 | 1.1890 | 7058856 | | 0.3377 | 0.1086 | 135 | 1.1993 | 7326624 | | 0.2734 | 0.1126 | 140 | 1.1889 | 7592432 | | 0.2268 | 0.1166 | 145 | 1.1897 | 7861688 | | 0.1791 | 0.1206 | 150 | 1.1898 | 8135448 | | 0.2283 | 0.1246 | 155 | 1.1914 | 8402560 | | 0.2149 | 0.1287 | 160 | 1.1828 | 8676568 | | 0.2341 | 0.1327 | 165 | 1.1826 | 8955080 | | 0.1841 | 0.1367 | 170 | 1.1806 | 9223800 | | 0.2142 | 0.1407 | 175 | 1.1779 | 9498448 | | 0.2336 | 0.1447 | 180 | 1.1816 | 9774656 | | 0.2968 | 0.1488 | 185 | 1.1765 | 10048784 | | 0.1364 | 0.1528 | 190 | 1.1827 | 10319264 | | 0.2182 | 0.1568 | 195 | 1.1731 | 10587344 | | 0.234 | 0.1608 | 200 | 1.1809 | 10855224 | | 0.2346 | 0.1648 | 205 | 1.1748 | 11119840 | | 0.2163 | 0.1689 | 210 | 1.1755 | 11385120 | | 0.1277 | 0.1729 | 215 | 1.1708 | 11649928 | | 0.2155 | 0.1769 | 220 | 1.1753 | 11919808 | | 0.2224 | 0.1809 | 225 | 1.1692 | 12187272 | | 0.2839 | 0.1849 | 230 | 1.1682 | 12449784 | | 0.2706 | 0.1890 | 235 | 1.1657 | 12713640 | | 0.208 | 0.1930 | 240 | 1.1663 | 12981568 | | 0.3348 | 0.1970 | 245 | 1.1668 | 13246088 | | 0.2328 | 0.2010 | 250 | 1.1643 | 13527032 | | 0.21 | 0.2050 | 255 | 1.1641 | 13794352 | | 0.2265 | 0.2091 | 260 | 1.1654 | 14060808 | | 0.2371 | 0.2131 | 265 | 1.1611 | 14326568 | | 0.1337 | 0.2171 | 270 | 1.1660 | 14596256 | | 0.1614 | 0.2211 | 275 | 1.1640 | 14863360 | | 0.156 | 0.2251 | 280 | 1.1612 | 15131520 | | 0.2243 | 0.2292 | 285 | 1.1625 | 15396504 | | 0.1061 | 0.2332 | 290 | 1.1599 | 15662048 | | 0.2318 | 0.2372 | 295 | 1.1587 | 15924504 | | 0.2002 | 0.2412 | 300 | 1.1617 | 16188520 | | 0.1501 | 0.2453 | 305 | 1.1567 | 16462088 | | 0.164 | 0.2493 | 310 | 1.1586 | 16728552 | | 0.1685 | 0.2533 | 315 | 1.1593 | 16997816 | | 0.2137 | 0.2573 | 320 | 1.1597 | 17271304 | | 0.2127 | 0.2613 | 325 | 1.1538 | 17545976 | | 0.2043 | 0.2654 | 330 | 1.1637 | 17818488 | | 0.2089 | 0.2694 | 335 | 1.1541 | 18087784 | | 0.2464 | 0.2734 | 340 | 1.1537 | 18359592 | | 0.2208 | 0.2774 | 345 | 1.1545 | 18632376 | | 0.1978 | 0.2814 | 350 | 1.1534 | 18901080 | | 0.1607 | 0.2855 | 355 | 1.1560 | 19166664 | | 0.1316 | 0.2895 | 360 | 1.1539 | 19433808 | | 0.1762 | 0.2935 | 365 | 1.1498 | 19707816 | | 0.2698 | 0.2975 | 370 | 1.1493 | 19974784 | | 0.1259 | 0.3015 | 375 | 1.1471 | 20245472 | | 0.1371 | 0.3056 | 380 | 1.1475 | 20525528 | | 0.2212 | 0.3096 | 385 | 1.1480 | 20802288 | | 0.2278 | 0.3136 | 390 | 1.1473 | 21064528 | | 0.1991 | 0.3176 | 395 | 1.1484 | 21333008 | | 0.1766 | 0.3216 | 400 | 1.1453 | 21608208 | | 0.129 | 0.3257 | 405 | 1.1489 | 21881824 | | 0.1451 | 0.3297 | 410 | 1.1449 | 22153952 | | 0.1526 | 0.3337 | 415 | 1.1432 | 22427152 | | 0.2111 | 0.3377 | 420 | 1.1434 | 22694304 | | 0.1552 | 0.3417 | 425 | 1.1449 | 22967072 | | 0.2009 | 0.3458 | 430 | 1.1419 | 23234792 | | 0.1275 | 0.3498 | 435 | 1.1435 | 23509544 | | 0.1635 | 0.3538 | 440 | 1.1424 | 23780264 | | 0.1961 | 0.3578 | 445 | 1.1379 | 24049528 | | 0.1363 | 0.3618 | 450 | 1.1440 | 24327024 | | 0.1557 | 0.3659 | 455 | 1.1421 | 24597640 | | 0.1438 | 0.3699 | 460 | 1.1379 | 24860472 | | 0.2417 | 0.3739 | 465 | 1.1393 | 25133472 | | 0.1708 | 0.3779 | 470 | 1.1363 | 25405592 | | 0.1151 | 0.3819 | 475 | 1.1423 | 25672128 | | 0.1869 | 0.3860 | 480 | 1.1394 | 25937304 | | 0.1781 | 0.3900 | 485 | 1.1371 | 26209136 | | 0.1838 | 0.3940 | 490 | 1.1383 | 26481296 | | 0.189 | 0.3980 | 495 | 1.1367 | 26752808 | | 0.1679 | 0.4021 | 500 | 1.1336 | 27019792 | | 0.0757 | 0.4061 | 505 | 1.1386 | 27288800 | | 0.1733 | 0.4101 | 510 | 1.1366 | 27554256 | | 0.1756 | 0.4141 | 515 | 1.1338 | 27831136 | | 0.1946 | 0.4181 | 520 | 1.1366 | 28100856 | | 0.188 | 0.4222 | 525 | 1.1330 | 28369880 | | 0.1342 | 0.4262 | 530 | 1.1344 | 28644496 | | 0.1069 | 0.4302 | 535 | 1.1356 | 28909128 | | 0.1664 | 0.4342 | 540 | 1.1350 | 29183120 | | 0.1259 | 0.4382 | 545 | 1.1349 | 29449168 | | 0.1821 | 0.4423 | 550 | 1.1306 | 29719296 | | 0.1504 | 0.4463 | 555 | 1.1333 | 29998376 | | 0.1849 | 0.4503 | 560 | 1.1339 | 30265000 | | 0.1199 | 0.4543 | 565 | 1.1305 | 30539552 | | 0.1379 | 0.4583 | 570 | 1.1315 | 30808552 | | 0.1908 | 0.4624 | 575 | 1.1320 | 31085144 | | 0.1671 | 0.4664 | 580 | 1.1316 | 31350720 | | 0.1946 | 0.4704 | 585 | 1.1303 | 31619488 | | 0.1132 | 0.4744 | 590 | 1.1300 | 31890024 | | 0.1649 | 0.4784 | 595 | 1.1296 | 32158616 | | 0.1743 | 0.4825 | 600 | 1.1289 | 32424064 | | 0.1583 | 0.4865 | 605 | 1.1268 | 32691920 | | 0.2174 | 0.4905 | 610 | 1.1305 | 32960456 | | 0.1992 | 0.4945 | 615 | 1.1311 | 33228408 | | 0.1422 | 0.4985 | 620 | 1.1280 | 33498080 | | 0.2044 | 0.5026 | 625 | 1.1322 | 33770464 | | 0.1475 | 0.5066 | 630 | 1.1341 | 34036664 | | 0.2034 | 0.5106 | 635 | 1.1277 | 34305904 | | 0.191 | 0.5146 | 640 | 1.1277 | 34575208 | | 0.1587 | 0.5186 | 645 | 1.1287 | 34850104 | | 0.2476 | 0.5227 | 650 | 1.1249 | 35116504 | | 0.2104 | 0.5267 | 655 | 1.1235 | 35393896 | | 0.1535 | 0.5307 | 660 | 1.1271 | 35668016 | | 0.1741 | 0.5347 | 665 | 1.1279 | 35943656 | | 0.2061 | 0.5387 | 670 | 1.1240 | 36214360 | | 0.1185 | 0.5428 | 675 | 1.1265 | 36481112 | | 0.1764 | 0.5468 | 680 | 1.1249 | 36749112 | | 0.1545 | 0.5508 | 685 | 1.1244 | 37024504 | | 0.0851 | 0.5548 | 690 | 1.1291 | 37299576 | | 0.1687 | 0.5589 | 695 | 1.1281 | 37569920 | | 0.1989 | 0.5629 | 700 | 1.1243 | 37842896 | | 0.1796 | 0.5669 | 705 | 1.1256 | 38116584 | | 0.1833 | 0.5709 | 710 | 1.1240 | 38390296 | | 0.1043 | 0.5749 | 715 | 1.1225 | 38659752 | | 0.1557 | 0.5790 | 720 | 1.1230 | 38935632 | | 0.15 | 0.5830 | 725 | 1.1246 | 39203824 | | 0.1298 | 0.5870 | 730 | 1.1232 | 39473296 | | 0.1411 | 0.5910 | 735 | 1.1229 | 39741784 | | 0.147 | 0.5950 | 740 | 1.1204 | 40009360 | | 0.2156 | 0.5991 | 745 | 1.1213 | 40282160 | | 0.1898 | 0.6031 | 750 | 1.1213 | 40548488 | | 0.1643 | 0.6071 | 755 | 1.1206 | 40817320 | | 0.1633 | 0.6111 | 760 | 1.1194 | 41089992 | | 0.2122 | 0.6151 | 765 | 1.1179 | 41361824 | | 0.144 | 0.6192 | 770 | 1.1214 | 41629032 | | 0.157 | 0.6232 | 775 | 1.1204 | 41903408 | | 0.1663 | 0.6272 | 780 | 1.1181 | 42177056 | | 0.1367 | 0.6312 | 785 | 1.1184 | 42442704 | | 0.1402 | 0.6352 | 790 | 1.1200 | 42709072 | | 0.1044 | 0.6393 | 795 | 1.1179 | 42983656 | | 0.144 | 0.6433 | 800 | 1.1194 | 43254328 | | 0.1364 | 0.6473 | 805 | 1.1194 | 43520408 | | 0.1167 | 0.6513 | 810 | 1.1207 | 43798248 | | 0.1429 | 0.6553 | 815 | 1.1164 | 44062968 | | 0.2173 | 0.6594 | 820 | 1.1190 | 44331392 | | 0.1875 | 0.6634 | 825 | 1.1198 | 44599096 | | 0.2148 | 0.6674 | 830 | 1.1168 | 44875096 | | 0.1699 | 0.6714 | 835 | 1.1172 | 45141816 | | 0.1539 | 0.6754 | 840 | 1.1186 | 45410320 | | 0.1526 | 0.6795 | 845 | 1.1158 | 45682664 | | 0.1549 | 0.6835 | 850 | 1.1160 | 45949128 | | 0.1646 | 0.6875 | 855 | 1.1175 | 46216296 | | 0.1656 | 0.6915 | 860 | 1.1159 | 46485272 | | 0.1019 | 0.6955 | 865 | 1.1151 | 46756512 | | 0.1653 | 0.6996 | 870 | 1.1183 | 47019768 | | 0.1772 | 0.7036 | 875 | 1.1163 | 47293512 | | 0.1266 | 0.7076 | 880 | 1.1179 | 47567472 | | 0.1419 | 0.7116 | 885 | 1.1160 | 47843136 | | 0.1546 | 0.7156 | 890 | 1.1134 | 48110400 | | 0.1465 | 0.7197 | 895 | 1.1148 | 48378768 | | 0.1887 | 0.7237 | 900 | 1.1161 | 48647776 | | 0.1071 | 0.7277 | 905 | 1.1143 | 48920112 | | 0.1604 | 0.7317 | 910 | 1.1151 | 49190544 | | 0.136 | 0.7358 | 915 | 1.1167 | 49459456 | | 0.2092 | 0.7398 | 920 | 1.1136 | 49732632 | | 0.1856 | 0.7438 | 925 | 1.1119 | 50006352 | | 0.1166 | 0.7478 | 930 | 1.1140 | 50275992 | | 0.2299 | 0.7518 | 935 | 1.1159 | 50547224 | | 0.0837 | 0.7559 | 940 | 1.1146 | 50811760 | | 0.1858 | 0.7599 | 945 | 1.1140 | 51084608 | | 0.1008 | 0.7639 | 950 | 1.1140 | 51358360 | | 0.1142 | 0.7679 | 955 | 1.1132 | 51634840 | | 0.1369 | 0.7719 | 960 | 1.1133 | 51907800 | | 0.1994 | 0.7760 | 965 | 1.1155 | 52177152 | | 0.1486 | 0.7800 | 970 | 1.1120 | 52447296 | | 0.1639 | 0.7840 | 975 | 1.1098 | 52707720 | | 0.17 | 0.7880 | 980 | 1.1100 | 52975352 | | 0.1352 | 0.7920 | 985 | 1.1121 | 53247312 | | 0.2062 | 0.7961 | 990 | 1.1133 | 53511120 | | 0.1653 | 0.8001 | 995 | 1.1122 | 53776256 | | 0.1477 | 0.8041 | 1000 | 1.1100 | 54039120 | | 0.1882 | 0.8081 | 1005 | 1.1101 | 54313664 | | 0.204 | 0.8121 | 1010 | 1.1123 | 54585280 | | 0.2283 | 0.8162 | 1015 | 1.1110 | 54858160 | | 0.1394 | 0.8202 | 1020 | 1.1093 | 55133280 | | 0.2045 | 0.8242 | 1025 | 1.1098 | 55405552 | | 0.1561 | 0.8282 | 1030 | 1.1102 | 55676936 | | 0.127 | 0.8322 | 1035 | 1.1096 | 55955232 | | 0.1593 | 0.8363 | 1040 | 1.1093 | 56227728 | | 0.1457 | 0.8403 | 1045 | 1.1085 | 56498840 | | 0.1505 | 0.8443 | 1050 | 1.1090 | 56774088 | | 0.0862 | 0.8483 | 1055 | 1.1083 | 57043608 | | 0.1709 | 0.8523 | 1060 | 1.1089 | 57316712 | | 0.1509 | 0.8564 | 1065 | 1.1101 | 57589400 | | 0.0836 | 0.8604 | 1070 | 1.1123 | 57861224 | | 0.0966 | 0.8644 | 1075 | 1.1111 | 58131600 | | 0.1184 | 0.8684 | 1080 | 1.1087 | 58406088 | | 0.1669 | 0.8724 | 1085 | 1.1105 | 58677056 | | 0.1793 | 0.8765 | 1090 | 1.1105 | 58947920 | | 0.1333 | 0.8805 | 1095 | 1.1075 | 59225112 | | 0.1882 | 0.8845 | 1100 | 1.1071 | 59497416 | | 0.1828 | 0.8885 | 1105 | 1.1118 | 59770208 | | 0.1227 | 0.8926 | 1110 | 1.1080 | 60043840 | | 0.1234 | 0.8966 | 1115 | 1.1054 | 60310768 | | 0.1036 | 0.9006 | 1120 | 1.1096 | 60581280 | | 0.1349 | 0.9046 | 1125 | 1.1095 | 60852688 | | 0.1352 | 0.9086 | 1130 | 1.1063 | 61121240 | | 0.1958 | 0.9127 | 1135 | 1.1101 | 61391848 | | 0.1466 | 0.9167 | 1140 | 1.1118 | 61664592 | | 0.1887 | 0.9207 | 1145 | 1.1101 | 61934816 | | 0.1769 | 0.9247 | 1150 | 1.1103 | 62201872 | | 0.2028 | 0.9287 | 1155 | 1.1101 | 62478232 | | 0.1435 | 0.9328 | 1160 | 1.1093 | 62750744 | | 0.1907 | 0.9368 | 1165 | 1.1085 | 63023736 | | 0.1991 | 0.9408 | 1170 | 1.1089 | 63296368 | | 0.0962 | 0.9448 | 1175 | 1.1070 | 63568840 | | 0.095 | 0.9488 | 1180 | 1.1092 | 63836616 | | 0.0938 | 0.9529 | 1185 | 1.1117 | 64100784 | | 0.161 | 0.9569 | 1190 | 1.1090 | 64373400 | | 0.1724 | 0.9609 | 1195 | 1.1078 | 64640352 | | 0.1555 | 0.9649 | 1200 | 1.1077 | 64919536 | | 0.1529 | 0.9689 | 1205 | 1.1102 | 65190544 | | 0.1552 | 0.9730 | 1210 | 1.1076 | 65465016 | | 0.1303 | 0.9770 | 1215 | 1.1060 | 65741824 | | 0.1953 | 0.9810 | 1220 | 1.1070 | 66013568 | | 0.1245 | 0.9850 | 1225 | 1.1057 | 66286920 | | 0.1362 | 0.9890 | 1230 | 1.1054 | 66561008 | | 0.217 | 0.9931 | 1235 | 1.1071 | 66831688 | | 0.2241 | 0.9971 | 1240 | 1.1069 | 67100816 | ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1