collapse_gemma-2-2b_hs2_accumulate_iter15_sftsd0
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1081
- Num Input Tokens Seen: 77891720
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.683 | 0.0035 | 5 | 1.3904 | 268608 |
1.7156 | 0.0070 | 10 | 1.3791 | 539728 |
1.6053 | 0.0105 | 15 | 1.3499 | 805648 |
1.4691 | 0.0140 | 20 | 1.3057 | 1077192 |
1.4175 | 0.0175 | 25 | 1.2625 | 1349808 |
1.3844 | 0.0209 | 30 | 1.2375 | 1621728 |
1.249 | 0.0244 | 35 | 1.2082 | 1899472 |
1.1207 | 0.0279 | 40 | 1.1959 | 2178936 |
1.0909 | 0.0314 | 45 | 1.2191 | 2442816 |
0.9671 | 0.0349 | 50 | 1.2171 | 2723080 |
0.8792 | 0.0384 | 55 | 1.2577 | 2997192 |
0.707 | 0.0419 | 60 | 1.2684 | 3270024 |
0.5271 | 0.0454 | 65 | 1.3094 | 3539056 |
0.4879 | 0.0489 | 70 | 1.2817 | 3811192 |
0.4132 | 0.0524 | 75 | 1.2756 | 4089752 |
0.3637 | 0.0559 | 80 | 1.2674 | 4360680 |
0.3605 | 0.0594 | 85 | 1.2434 | 4635024 |
0.2507 | 0.0628 | 90 | 1.2377 | 4908224 |
0.3487 | 0.0663 | 95 | 1.2206 | 5179720 |
0.318 | 0.0698 | 100 | 1.2428 | 5445216 |
0.2242 | 0.0733 | 105 | 1.2300 | 5720944 |
0.2678 | 0.0768 | 110 | 1.2283 | 5989512 |
0.2989 | 0.0803 | 115 | 1.2170 | 6258104 |
0.2574 | 0.0838 | 120 | 1.2120 | 6527512 |
0.1835 | 0.0873 | 125 | 1.2104 | 6794096 |
0.2531 | 0.0908 | 130 | 1.2045 | 7070160 |
0.3273 | 0.0943 | 135 | 1.2095 | 7343296 |
0.208 | 0.0978 | 140 | 1.1983 | 7619008 |
0.3085 | 0.1012 | 145 | 1.2024 | 7890440 |
0.2839 | 0.1047 | 150 | 1.1894 | 8166976 |
0.2111 | 0.1082 | 155 | 1.2019 | 8443096 |
0.228 | 0.1117 | 160 | 1.1932 | 8713680 |
0.2772 | 0.1152 | 165 | 1.1911 | 8984200 |
0.2128 | 0.1187 | 170 | 1.2005 | 9260024 |
0.2098 | 0.1222 | 175 | 1.1897 | 9535504 |
0.2509 | 0.1257 | 180 | 1.1965 | 9806016 |
0.1895 | 0.1292 | 185 | 1.1897 | 10082056 |
0.1858 | 0.1327 | 190 | 1.1865 | 10353256 |
0.1466 | 0.1362 | 195 | 1.1896 | 10632272 |
0.1778 | 0.1397 | 200 | 1.1831 | 10909248 |
0.1661 | 0.1431 | 205 | 1.1837 | 11178344 |
0.2259 | 0.1466 | 210 | 1.1819 | 11455192 |
0.1824 | 0.1501 | 215 | 1.1840 | 11729536 |
0.1419 | 0.1536 | 220 | 1.1836 | 11998968 |
0.1676 | 0.1571 | 225 | 1.1791 | 12269144 |
0.1508 | 0.1606 | 230 | 1.1757 | 12545496 |
0.188 | 0.1641 | 235 | 1.1783 | 12823808 |
0.2608 | 0.1676 | 240 | 1.1748 | 13095184 |
0.098 | 0.1711 | 245 | 1.1734 | 13366568 |
0.1447 | 0.1746 | 250 | 1.1802 | 13635776 |
0.2193 | 0.1781 | 255 | 1.1700 | 13899536 |
0.1439 | 0.1815 | 260 | 1.1726 | 14173056 |
0.2237 | 0.1850 | 265 | 1.1764 | 14444536 |
0.2628 | 0.1885 | 270 | 1.1666 | 14724096 |
0.1653 | 0.1920 | 275 | 1.1691 | 15001520 |
0.1856 | 0.1955 | 280 | 1.1666 | 15271800 |
0.181 | 0.1990 | 285 | 1.1650 | 15545208 |
0.1772 | 0.2025 | 290 | 1.1670 | 15813104 |
0.112 | 0.2060 | 295 | 1.1640 | 16085112 |
0.1872 | 0.2095 | 300 | 1.1734 | 16357080 |
0.2365 | 0.2130 | 305 | 1.1666 | 16633032 |
0.2251 | 0.2165 | 310 | 1.1644 | 16905296 |
0.1395 | 0.2200 | 315 | 1.1714 | 17179632 |
0.1538 | 0.2234 | 320 | 1.1625 | 17447920 |
0.1432 | 0.2269 | 325 | 1.1639 | 17722168 |
0.1993 | 0.2304 | 330 | 1.1658 | 17994352 |
0.2021 | 0.2339 | 335 | 1.1589 | 18266408 |
0.2699 | 0.2374 | 340 | 1.1552 | 18541520 |
0.133 | 0.2409 | 345 | 1.1580 | 18810960 |
0.1804 | 0.2444 | 350 | 1.1589 | 19086320 |
0.1332 | 0.2479 | 355 | 1.1589 | 19366912 |
0.1872 | 0.2514 | 360 | 1.1610 | 19633032 |
0.1209 | 0.2549 | 365 | 1.1549 | 19897944 |
0.1408 | 0.2584 | 370 | 1.1596 | 20170656 |
0.1767 | 0.2618 | 375 | 1.1677 | 20442472 |
0.1285 | 0.2653 | 380 | 1.1564 | 20718024 |
0.1589 | 0.2688 | 385 | 1.1571 | 20986216 |
0.1799 | 0.2723 | 390 | 1.1607 | 21258656 |
0.194 | 0.2758 | 395 | 1.1555 | 21534424 |
0.1321 | 0.2793 | 400 | 1.1530 | 21809152 |
0.181 | 0.2828 | 405 | 1.1575 | 22076432 |
0.1589 | 0.2863 | 410 | 1.1543 | 22351888 |
0.2326 | 0.2898 | 415 | 1.1535 | 22625568 |
0.1409 | 0.2933 | 420 | 1.1565 | 22905072 |
0.1945 | 0.2968 | 425 | 1.1495 | 23173072 |
0.1987 | 0.3003 | 430 | 1.1497 | 23438248 |
0.1867 | 0.3037 | 435 | 1.1530 | 23711176 |
0.2078 | 0.3072 | 440 | 1.1501 | 23984736 |
0.2226 | 0.3107 | 445 | 1.1509 | 24255168 |
0.242 | 0.3142 | 450 | 1.1486 | 24519976 |
0.1088 | 0.3177 | 455 | 1.1478 | 24792248 |
0.1165 | 0.3212 | 460 | 1.1499 | 25068120 |
0.1584 | 0.3247 | 465 | 1.1480 | 25344936 |
0.1865 | 0.3282 | 470 | 1.1462 | 25606624 |
0.1017 | 0.3317 | 475 | 1.1477 | 25878984 |
0.1608 | 0.3352 | 480 | 1.1443 | 26148736 |
0.2164 | 0.3387 | 485 | 1.1436 | 26419200 |
0.2072 | 0.3421 | 490 | 1.1422 | 26686976 |
0.1927 | 0.3456 | 495 | 1.1431 | 26958920 |
0.1422 | 0.3491 | 500 | 1.1441 | 27231656 |
0.1536 | 0.3526 | 505 | 1.1436 | 27504048 |
0.1454 | 0.3561 | 510 | 1.1406 | 27779360 |
0.1831 | 0.3596 | 515 | 1.1402 | 28045744 |
0.1535 | 0.3631 | 520 | 1.1466 | 28314464 |
0.1683 | 0.3666 | 525 | 1.1444 | 28579728 |
0.147 | 0.3701 | 530 | 1.1409 | 28851064 |
0.1583 | 0.3736 | 535 | 1.1422 | 29119904 |
0.1564 | 0.3771 | 540 | 1.1462 | 29395160 |
0.227 | 0.3806 | 545 | 1.1404 | 29665040 |
0.1462 | 0.3840 | 550 | 1.1402 | 29942320 |
0.204 | 0.3875 | 555 | 1.1421 | 30215240 |
0.1626 | 0.3910 | 560 | 1.1408 | 30480344 |
0.1726 | 0.3945 | 565 | 1.1373 | 30747648 |
0.1747 | 0.3980 | 570 | 1.1417 | 31018320 |
0.1556 | 0.4015 | 575 | 1.1417 | 31288856 |
0.1613 | 0.4050 | 580 | 1.1381 | 31562912 |
0.1924 | 0.4085 | 585 | 1.1396 | 31824568 |
0.1798 | 0.4120 | 590 | 1.1391 | 32100304 |
0.1421 | 0.4155 | 595 | 1.1356 | 32367776 |
0.2076 | 0.4190 | 600 | 1.1356 | 32645744 |
0.1201 | 0.4224 | 605 | 1.1346 | 32925376 |
0.1629 | 0.4259 | 610 | 1.1357 | 33199912 |
0.1634 | 0.4294 | 615 | 1.1349 | 33477704 |
0.1295 | 0.4329 | 620 | 1.1357 | 33749856 |
0.0998 | 0.4364 | 625 | 1.1357 | 34025880 |
0.1166 | 0.4399 | 630 | 1.1343 | 34302848 |
0.147 | 0.4434 | 635 | 1.1363 | 34575392 |
0.1328 | 0.4469 | 640 | 1.1358 | 34848888 |
0.1339 | 0.4504 | 645 | 1.1371 | 35120232 |
0.1733 | 0.4539 | 650 | 1.1338 | 35400928 |
0.1444 | 0.4574 | 655 | 1.1357 | 35671920 |
0.1588 | 0.4609 | 660 | 1.1350 | 35941448 |
0.2018 | 0.4643 | 665 | 1.1311 | 36214680 |
0.1342 | 0.4678 | 670 | 1.1330 | 36486480 |
0.1565 | 0.4713 | 675 | 1.1334 | 36756072 |
0.1986 | 0.4748 | 680 | 1.1318 | 37021600 |
0.1767 | 0.4783 | 685 | 1.1312 | 37278080 |
0.1319 | 0.4818 | 690 | 1.1332 | 37547752 |
0.1876 | 0.4853 | 695 | 1.1322 | 37813912 |
0.198 | 0.4888 | 700 | 1.1276 | 38083664 |
0.098 | 0.4923 | 705 | 1.1291 | 38358384 |
0.117 | 0.4958 | 710 | 1.1278 | 38633224 |
0.1131 | 0.4993 | 715 | 1.1284 | 38909376 |
0.1825 | 0.5027 | 720 | 1.1295 | 39174824 |
0.1617 | 0.5062 | 725 | 1.1298 | 39442704 |
0.1818 | 0.5097 | 730 | 1.1267 | 39717776 |
0.1564 | 0.5132 | 735 | 1.1322 | 39993408 |
0.1976 | 0.5167 | 740 | 1.1297 | 40260568 |
0.1577 | 0.5202 | 745 | 1.1265 | 40529400 |
0.1166 | 0.5237 | 750 | 1.1280 | 40803152 |
0.141 | 0.5272 | 755 | 1.1282 | 41072808 |
0.1728 | 0.5307 | 760 | 1.1275 | 41346128 |
0.1537 | 0.5342 | 765 | 1.1259 | 41621680 |
0.1102 | 0.5377 | 770 | 1.1262 | 41895272 |
0.1062 | 0.5412 | 775 | 1.1299 | 42162592 |
0.0909 | 0.5446 | 780 | 1.1295 | 42430120 |
0.1499 | 0.5481 | 785 | 1.1261 | 42707144 |
0.1336 | 0.5516 | 790 | 1.1272 | 42987560 |
0.1555 | 0.5551 | 795 | 1.1254 | 43252544 |
0.1196 | 0.5586 | 800 | 1.1237 | 43526296 |
0.1249 | 0.5621 | 805 | 1.1267 | 43799216 |
0.1883 | 0.5656 | 810 | 1.1276 | 44063752 |
0.1699 | 0.5691 | 815 | 1.1232 | 44337928 |
0.2577 | 0.5726 | 820 | 1.1236 | 44614144 |
0.1646 | 0.5761 | 825 | 1.1241 | 44877920 |
0.1446 | 0.5796 | 830 | 1.1231 | 45143520 |
0.0868 | 0.5830 | 835 | 1.1214 | 45414720 |
0.1761 | 0.5865 | 840 | 1.1237 | 45680552 |
0.136 | 0.5900 | 845 | 1.1233 | 45955608 |
0.1395 | 0.5935 | 850 | 1.1257 | 46223576 |
0.1153 | 0.5970 | 855 | 1.1246 | 46498056 |
0.1156 | 0.6005 | 860 | 1.1217 | 46769120 |
0.1864 | 0.6040 | 865 | 1.1260 | 47046296 |
0.2383 | 0.6075 | 870 | 1.1252 | 47316616 |
0.1232 | 0.6110 | 875 | 1.1216 | 47589216 |
0.1258 | 0.6145 | 880 | 1.1229 | 47861240 |
0.1455 | 0.6180 | 885 | 1.1272 | 48130192 |
0.1859 | 0.6215 | 890 | 1.1227 | 48402184 |
0.1112 | 0.6249 | 895 | 1.1227 | 48671768 |
0.2105 | 0.6284 | 900 | 1.1256 | 48948288 |
0.103 | 0.6319 | 905 | 1.1222 | 49223488 |
0.2064 | 0.6354 | 910 | 1.1203 | 49493928 |
0.119 | 0.6389 | 915 | 1.1203 | 49770904 |
0.155 | 0.6424 | 920 | 1.1232 | 50040960 |
0.1634 | 0.6459 | 925 | 1.1205 | 50310880 |
0.1476 | 0.6494 | 930 | 1.1189 | 50578432 |
0.1155 | 0.6529 | 935 | 1.1210 | 50849496 |
0.1976 | 0.6564 | 940 | 1.1211 | 51118728 |
0.1685 | 0.6599 | 945 | 1.1183 | 51393656 |
0.1498 | 0.6633 | 950 | 1.1182 | 51665192 |
0.1313 | 0.6668 | 955 | 1.1184 | 51939848 |
0.1431 | 0.6703 | 960 | 1.1193 | 52212432 |
0.1369 | 0.6738 | 965 | 1.1204 | 52488344 |
0.1626 | 0.6773 | 970 | 1.1185 | 52762336 |
0.1574 | 0.6808 | 975 | 1.1186 | 53037688 |
0.1742 | 0.6843 | 980 | 1.1187 | 53302672 |
0.0953 | 0.6878 | 985 | 1.1169 | 53575096 |
0.188 | 0.6913 | 990 | 1.1167 | 53847848 |
0.1145 | 0.6948 | 995 | 1.1180 | 54119184 |
0.1698 | 0.6983 | 1000 | 1.1211 | 54394464 |
0.1842 | 0.7018 | 1005 | 1.1167 | 54664896 |
0.1587 | 0.7052 | 1010 | 1.1173 | 54937696 |
0.0725 | 0.7087 | 1015 | 1.1168 | 55220984 |
0.1504 | 0.7122 | 1020 | 1.1162 | 55492144 |
0.1698 | 0.7157 | 1025 | 1.1180 | 55768240 |
0.1672 | 0.7192 | 1030 | 1.1186 | 56042128 |
0.1386 | 0.7227 | 1035 | 1.1160 | 56316408 |
0.0833 | 0.7262 | 1040 | 1.1155 | 56587048 |
0.1524 | 0.7297 | 1045 | 1.1166 | 56868376 |
0.1016 | 0.7332 | 1050 | 1.1149 | 57141272 |
0.1248 | 0.7367 | 1055 | 1.1144 | 57406280 |
0.1628 | 0.7402 | 1060 | 1.1172 | 57679648 |
0.137 | 0.7437 | 1065 | 1.1170 | 57951616 |
0.1017 | 0.7471 | 1070 | 1.1144 | 58226088 |
0.1118 | 0.7506 | 1075 | 1.1139 | 58500696 |
0.2009 | 0.7541 | 1080 | 1.1153 | 58770144 |
0.1148 | 0.7576 | 1085 | 1.1170 | 59042656 |
0.1118 | 0.7611 | 1090 | 1.1153 | 59315840 |
0.1041 | 0.7646 | 1095 | 1.1155 | 59582544 |
0.1488 | 0.7681 | 1100 | 1.1186 | 59859680 |
0.1096 | 0.7716 | 1105 | 1.1146 | 60133584 |
0.0987 | 0.7751 | 1110 | 1.1124 | 60401336 |
0.2119 | 0.7786 | 1115 | 1.1139 | 60680200 |
0.0871 | 0.7821 | 1120 | 1.1146 | 60950888 |
0.168 | 0.7855 | 1125 | 1.1140 | 61221400 |
0.1152 | 0.7890 | 1130 | 1.1141 | 61488952 |
0.1289 | 0.7925 | 1135 | 1.1142 | 61761696 |
0.1567 | 0.7960 | 1140 | 1.1124 | 62035848 |
0.2486 | 0.7995 | 1145 | 1.1139 | 62307416 |
0.1375 | 0.8030 | 1150 | 1.1137 | 62578952 |
0.1956 | 0.8065 | 1155 | 1.1131 | 62849680 |
0.1513 | 0.8100 | 1160 | 1.1131 | 63123400 |
0.1511 | 0.8135 | 1165 | 1.1120 | 63396592 |
0.1576 | 0.8170 | 1170 | 1.1140 | 63671280 |
0.2095 | 0.8205 | 1175 | 1.1117 | 63942752 |
0.1327 | 0.8240 | 1180 | 1.1097 | 64220128 |
0.1466 | 0.8274 | 1185 | 1.1116 | 64488168 |
0.0961 | 0.8309 | 1190 | 1.1126 | 64757816 |
0.1599 | 0.8344 | 1195 | 1.1113 | 65036800 |
0.1632 | 0.8379 | 1200 | 1.1099 | 65307128 |
0.1696 | 0.8414 | 1205 | 1.1098 | 65579368 |
0.1505 | 0.8449 | 1210 | 1.1111 | 65848768 |
0.1612 | 0.8484 | 1215 | 1.1112 | 66124200 |
0.2065 | 0.8519 | 1220 | 1.1093 | 66389832 |
0.113 | 0.8554 | 1225 | 1.1113 | 66665504 |
0.1861 | 0.8589 | 1230 | 1.1123 | 66935976 |
0.156 | 0.8624 | 1235 | 1.1112 | 67204520 |
0.1168 | 0.8658 | 1240 | 1.1108 | 67464344 |
0.0982 | 0.8693 | 1245 | 1.1116 | 67727448 |
0.1866 | 0.8728 | 1250 | 1.1105 | 67994952 |
0.1093 | 0.8763 | 1255 | 1.1090 | 68270080 |
0.1833 | 0.8798 | 1260 | 1.1091 | 68538632 |
0.1068 | 0.8833 | 1265 | 1.1090 | 68816784 |
0.1257 | 0.8868 | 1270 | 1.1102 | 69086096 |
0.1563 | 0.8903 | 1275 | 1.1119 | 69359936 |
0.1098 | 0.8938 | 1280 | 1.1113 | 69632920 |
0.1867 | 0.8973 | 1285 | 1.1084 | 69907920 |
0.111 | 0.9008 | 1290 | 1.1093 | 70176696 |
0.1504 | 0.9043 | 1295 | 1.1099 | 70451488 |
0.2302 | 0.9077 | 1300 | 1.1102 | 70711200 |
0.1971 | 0.9112 | 1305 | 1.1106 | 70983264 |
0.1843 | 0.9147 | 1310 | 1.1110 | 71248496 |
0.1476 | 0.9182 | 1315 | 1.1091 | 71527032 |
0.0964 | 0.9217 | 1320 | 1.1066 | 71798696 |
0.1743 | 0.9252 | 1325 | 1.1079 | 72075456 |
0.0996 | 0.9287 | 1330 | 1.1102 | 72343440 |
0.0989 | 0.9322 | 1335 | 1.1095 | 72613832 |
0.1511 | 0.9357 | 1340 | 1.1072 | 72886808 |
0.1586 | 0.9392 | 1345 | 1.1069 | 73158712 |
0.1616 | 0.9427 | 1350 | 1.1090 | 73425704 |
0.1458 | 0.9461 | 1355 | 1.1088 | 73695768 |
0.1181 | 0.9496 | 1360 | 1.1081 | 73965592 |
0.1558 | 0.9531 | 1365 | 1.1087 | 74237928 |
0.0924 | 0.9566 | 1370 | 1.1103 | 74509232 |
0.1207 | 0.9601 | 1375 | 1.1095 | 74777112 |
0.1366 | 0.9636 | 1380 | 1.1073 | 75041152 |
0.2083 | 0.9671 | 1385 | 1.1076 | 75322960 |
0.1933 | 0.9706 | 1390 | 1.1085 | 75590152 |
0.1358 | 0.9741 | 1395 | 1.1080 | 75872072 |
0.1317 | 0.9776 | 1400 | 1.1078 | 76145056 |
0.1305 | 0.9811 | 1405 | 1.1080 | 76417888 |
0.1956 | 0.9846 | 1410 | 1.1082 | 76688480 |
0.1598 | 0.9880 | 1415 | 1.1083 | 76959576 |
0.2193 | 0.9915 | 1420 | 1.1082 | 77228864 |
0.2047 | 0.9950 | 1425 | 1.1078 | 77508456 |
0.1036 | 0.9985 | 1430 | 1.1081 | 77781776 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 2
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter15_sftsd0
Base model
google/gemma-2-2b