metadata
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd2
results: []
collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd2
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1076
- Num Input Tokens Seen: 55148120
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3956 | 0 |
1.6242 | 0.0049 | 5 | 1.3934 | 272064 |
1.6752 | 0.0098 | 10 | 1.3710 | 554032 |
1.6228 | 0.0147 | 15 | 1.3186 | 827360 |
1.543 | 0.0197 | 20 | 1.2658 | 1096136 |
1.423 | 0.0246 | 25 | 1.2235 | 1372816 |
1.3264 | 0.0295 | 30 | 1.1841 | 1647472 |
1.2995 | 0.0344 | 35 | 1.1788 | 1919584 |
1.1474 | 0.0393 | 40 | 1.1874 | 2196376 |
1.0666 | 0.0442 | 45 | 1.1899 | 2465560 |
0.985 | 0.0492 | 50 | 1.2221 | 2741672 |
0.713 | 0.0541 | 55 | 1.2507 | 3013656 |
0.5765 | 0.0590 | 60 | 1.2796 | 3290880 |
0.5448 | 0.0639 | 65 | 1.2517 | 3557720 |
0.4657 | 0.0688 | 70 | 1.2520 | 3832752 |
0.4618 | 0.0737 | 75 | 1.2386 | 4109368 |
0.3921 | 0.0786 | 80 | 1.2370 | 4375248 |
0.3891 | 0.0836 | 85 | 1.2284 | 4647088 |
0.4561 | 0.0885 | 90 | 1.2214 | 4917208 |
0.3254 | 0.0934 | 95 | 1.2225 | 5185928 |
0.2939 | 0.0983 | 100 | 1.2260 | 5452632 |
0.3003 | 0.1032 | 105 | 1.2119 | 5723992 |
0.2921 | 0.1081 | 110 | 1.2143 | 5998096 |
0.3005 | 0.1130 | 115 | 1.2009 | 6272024 |
0.2106 | 0.1180 | 120 | 1.2049 | 6544088 |
0.3227 | 0.1229 | 125 | 1.2043 | 6815696 |
0.359 | 0.1278 | 130 | 1.2067 | 7091168 |
0.2451 | 0.1327 | 135 | 1.2018 | 7363752 |
0.2543 | 0.1376 | 140 | 1.2051 | 7634680 |
0.2264 | 0.1425 | 145 | 1.1911 | 7902192 |
0.2881 | 0.1475 | 150 | 1.1969 | 8174216 |
0.2406 | 0.1524 | 155 | 1.1873 | 8446056 |
0.2712 | 0.1573 | 160 | 1.1878 | 8711432 |
0.2502 | 0.1622 | 165 | 1.1933 | 8986104 |
0.2625 | 0.1671 | 170 | 1.1817 | 9260624 |
0.2239 | 0.1720 | 175 | 1.1872 | 9537496 |
0.2087 | 0.1769 | 180 | 1.1822 | 9811632 |
0.2819 | 0.1819 | 185 | 1.1781 | 10083560 |
0.1772 | 0.1868 | 190 | 1.1825 | 10356680 |
0.2153 | 0.1917 | 195 | 1.1797 | 10627000 |
0.2606 | 0.1966 | 200 | 1.1768 | 10901816 |
0.183 | 0.2015 | 205 | 1.1799 | 11168408 |
0.1972 | 0.2064 | 210 | 1.1756 | 11441368 |
0.2959 | 0.2114 | 215 | 1.1733 | 11712792 |
0.2225 | 0.2163 | 220 | 1.1740 | 11983016 |
0.3001 | 0.2212 | 225 | 1.1673 | 12252912 |
0.2043 | 0.2261 | 230 | 1.1743 | 12527672 |
0.2225 | 0.2310 | 235 | 1.1721 | 12804760 |
0.2131 | 0.2359 | 240 | 1.1681 | 13073272 |
0.2541 | 0.2408 | 245 | 1.1697 | 13343952 |
0.2392 | 0.2458 | 250 | 1.1652 | 13615616 |
0.2222 | 0.2507 | 255 | 1.1673 | 13878200 |
0.2152 | 0.2556 | 260 | 1.1603 | 14145720 |
0.1775 | 0.2605 | 265 | 1.1601 | 14421576 |
0.184 | 0.2654 | 270 | 1.1659 | 14691256 |
0.1615 | 0.2703 | 275 | 1.1560 | 14966608 |
0.2042 | 0.2753 | 280 | 1.1613 | 15238320 |
0.2344 | 0.2802 | 285 | 1.1605 | 15514760 |
0.1502 | 0.2851 | 290 | 1.1520 | 15782440 |
0.1738 | 0.2900 | 295 | 1.1576 | 16044664 |
0.2125 | 0.2949 | 300 | 1.1566 | 16313976 |
0.2228 | 0.2998 | 305 | 1.1505 | 16586576 |
0.1751 | 0.3047 | 310 | 1.1548 | 16857848 |
0.2008 | 0.3097 | 315 | 1.1546 | 17126992 |
0.1452 | 0.3146 | 320 | 1.1527 | 17400296 |
0.2659 | 0.3195 | 325 | 1.1553 | 17668032 |
0.2173 | 0.3244 | 330 | 1.1508 | 17935336 |
0.215 | 0.3293 | 335 | 1.1485 | 18205424 |
0.2193 | 0.3342 | 340 | 1.1501 | 18481400 |
0.1883 | 0.3391 | 345 | 1.1484 | 18752120 |
0.1204 | 0.3441 | 350 | 1.1455 | 19022232 |
0.2041 | 0.3490 | 355 | 1.1473 | 19291984 |
0.1734 | 0.3539 | 360 | 1.1446 | 19560032 |
0.191 | 0.3588 | 365 | 1.1459 | 19841512 |
0.2036 | 0.3637 | 370 | 1.1427 | 20110248 |
0.227 | 0.3686 | 375 | 1.1416 | 20383840 |
0.2724 | 0.3736 | 380 | 1.1432 | 20651880 |
0.277 | 0.3785 | 385 | 1.1394 | 20925536 |
0.185 | 0.3834 | 390 | 1.1404 | 21190872 |
0.1613 | 0.3883 | 395 | 1.1423 | 21462104 |
0.2139 | 0.3932 | 400 | 1.1366 | 21735760 |
0.238 | 0.3981 | 405 | 1.1401 | 22007944 |
0.1772 | 0.4030 | 410 | 1.1446 | 22274704 |
0.2354 | 0.4080 | 415 | 1.1385 | 22551304 |
0.2089 | 0.4129 | 420 | 1.1372 | 22819992 |
0.1772 | 0.4178 | 425 | 1.1395 | 23085864 |
0.2116 | 0.4227 | 430 | 1.1360 | 23355776 |
0.1528 | 0.4276 | 435 | 1.1362 | 23630936 |
0.1801 | 0.4325 | 440 | 1.1363 | 23902680 |
0.152 | 0.4375 | 445 | 1.1318 | 24168248 |
0.237 | 0.4424 | 450 | 1.1363 | 24435488 |
0.1998 | 0.4473 | 455 | 1.1348 | 24710872 |
0.2259 | 0.4522 | 460 | 1.1325 | 24983416 |
0.2071 | 0.4571 | 465 | 1.1319 | 25250048 |
0.16 | 0.4620 | 470 | 1.1330 | 25521736 |
0.1693 | 0.4669 | 475 | 1.1312 | 25795336 |
0.2649 | 0.4719 | 480 | 1.1308 | 26066920 |
0.1038 | 0.4768 | 485 | 1.1307 | 26331024 |
0.1938 | 0.4817 | 490 | 1.1287 | 26598616 |
0.1767 | 0.4866 | 495 | 1.1319 | 26869544 |
0.3223 | 0.4915 | 500 | 1.1328 | 27140784 |
0.1802 | 0.4964 | 505 | 1.1282 | 27411872 |
0.1962 | 0.5014 | 510 | 1.1316 | 27675280 |
0.1977 | 0.5063 | 515 | 1.1293 | 27943040 |
0.1458 | 0.5112 | 520 | 1.1286 | 28217320 |
0.2375 | 0.5161 | 525 | 1.1290 | 28493040 |
0.2269 | 0.5210 | 530 | 1.1275 | 28762672 |
0.1589 | 0.5259 | 535 | 1.1280 | 29029744 |
0.2142 | 0.5308 | 540 | 1.1297 | 29299000 |
0.2219 | 0.5358 | 545 | 1.1282 | 29570248 |
0.1128 | 0.5407 | 550 | 1.1286 | 29847000 |
0.1866 | 0.5456 | 555 | 1.1272 | 30115376 |
0.1865 | 0.5505 | 560 | 1.1279 | 30389984 |
0.2061 | 0.5554 | 565 | 1.1234 | 30655792 |
0.1548 | 0.5603 | 570 | 1.1237 | 30933664 |
0.2025 | 0.5652 | 575 | 1.1249 | 31201768 |
0.2701 | 0.5702 | 580 | 1.1261 | 31476376 |
0.2446 | 0.5751 | 585 | 1.1236 | 31743576 |
0.1323 | 0.5800 | 590 | 1.1243 | 32012336 |
0.2005 | 0.5849 | 595 | 1.1241 | 32285872 |
0.1525 | 0.5898 | 600 | 1.1249 | 32558824 |
0.1703 | 0.5947 | 605 | 1.1236 | 32825608 |
0.1633 | 0.5997 | 610 | 1.1211 | 33097056 |
0.1968 | 0.6046 | 615 | 1.1234 | 33371136 |
0.2604 | 0.6095 | 620 | 1.1223 | 33637528 |
0.2091 | 0.6144 | 625 | 1.1225 | 33906600 |
0.1176 | 0.6193 | 630 | 1.1248 | 34176584 |
0.1487 | 0.6242 | 635 | 1.1229 | 34448496 |
0.199 | 0.6291 | 640 | 1.1209 | 34722752 |
0.1523 | 0.6341 | 645 | 1.1212 | 34990088 |
0.1457 | 0.6390 | 650 | 1.1237 | 35259080 |
0.2531 | 0.6439 | 655 | 1.1227 | 35525968 |
0.1487 | 0.6488 | 660 | 1.1193 | 35797952 |
0.1589 | 0.6537 | 665 | 1.1216 | 36072304 |
0.2855 | 0.6586 | 670 | 1.1224 | 36343472 |
0.1557 | 0.6636 | 675 | 1.1186 | 36614592 |
0.1411 | 0.6685 | 680 | 1.1202 | 36886360 |
0.2196 | 0.6734 | 685 | 1.1211 | 37158136 |
0.1054 | 0.6783 | 690 | 1.1204 | 37430296 |
0.2536 | 0.6832 | 695 | 1.1198 | 37703184 |
0.2347 | 0.6881 | 700 | 1.1187 | 37972000 |
0.2074 | 0.6930 | 705 | 1.1180 | 38244936 |
0.1818 | 0.6980 | 710 | 1.1156 | 38515152 |
0.1484 | 0.7029 | 715 | 1.1196 | 38786104 |
0.234 | 0.7078 | 720 | 1.1224 | 39053816 |
0.1783 | 0.7127 | 725 | 1.1179 | 39323896 |
0.159 | 0.7176 | 730 | 1.1158 | 39599848 |
0.1323 | 0.7225 | 735 | 1.1204 | 39869656 |
0.1816 | 0.7275 | 740 | 1.1216 | 40137064 |
0.175 | 0.7324 | 745 | 1.1173 | 40405408 |
0.2641 | 0.7373 | 750 | 1.1163 | 40673816 |
0.1334 | 0.7422 | 755 | 1.1151 | 40936336 |
0.2107 | 0.7471 | 760 | 1.1186 | 41207808 |
0.2213 | 0.7520 | 765 | 1.1162 | 41484608 |
0.1493 | 0.7569 | 770 | 1.1133 | 41758384 |
0.1367 | 0.7619 | 775 | 1.1153 | 42031848 |
0.1636 | 0.7668 | 780 | 1.1173 | 42293304 |
0.1492 | 0.7717 | 785 | 1.1160 | 42563384 |
0.2128 | 0.7766 | 790 | 1.1158 | 42825784 |
0.2324 | 0.7815 | 795 | 1.1155 | 43101064 |
0.2325 | 0.7864 | 800 | 1.1134 | 43373512 |
0.1865 | 0.7913 | 805 | 1.1167 | 43637872 |
0.2124 | 0.7963 | 810 | 1.1154 | 43905256 |
0.1661 | 0.8012 | 815 | 1.1109 | 44173024 |
0.1994 | 0.8061 | 820 | 1.1108 | 44451088 |
0.2008 | 0.8110 | 825 | 1.1119 | 44724768 |
0.1678 | 0.8159 | 830 | 1.1130 | 44995112 |
0.2089 | 0.8208 | 835 | 1.1126 | 45265880 |
0.2064 | 0.8258 | 840 | 1.1119 | 45532968 |
0.2039 | 0.8307 | 845 | 1.1133 | 45810568 |
0.152 | 0.8356 | 850 | 1.1124 | 46083736 |
0.1731 | 0.8405 | 855 | 1.1112 | 46356904 |
0.2052 | 0.8454 | 860 | 1.1110 | 46627936 |
0.2187 | 0.8503 | 865 | 1.1093 | 46903968 |
0.2456 | 0.8552 | 870 | 1.1106 | 47175120 |
0.1912 | 0.8602 | 875 | 1.1124 | 47446448 |
0.1495 | 0.8651 | 880 | 1.1115 | 47725256 |
0.2542 | 0.8700 | 885 | 1.1117 | 47996528 |
0.202 | 0.8749 | 890 | 1.1092 | 48262184 |
0.0888 | 0.8798 | 895 | 1.1104 | 48535040 |
0.1544 | 0.8847 | 900 | 1.1143 | 48812872 |
0.1341 | 0.8897 | 905 | 1.1120 | 49088752 |
0.1137 | 0.8946 | 910 | 1.1115 | 49363536 |
0.2127 | 0.8995 | 915 | 1.1076 | 49640256 |
0.2183 | 0.9044 | 920 | 1.1078 | 49908416 |
0.1487 | 0.9093 | 925 | 1.1101 | 50178216 |
0.2102 | 0.9142 | 930 | 1.1093 | 50445536 |
0.2309 | 0.9191 | 935 | 1.1090 | 50712632 |
0.2157 | 0.9241 | 940 | 1.1096 | 50986048 |
0.1194 | 0.9290 | 945 | 1.1090 | 51259968 |
0.1138 | 0.9339 | 950 | 1.1091 | 51530784 |
0.2443 | 0.9388 | 955 | 1.1094 | 51803680 |
0.1772 | 0.9437 | 960 | 1.1085 | 52071288 |
0.1181 | 0.9486 | 965 | 1.1093 | 52337984 |
0.1651 | 0.9536 | 970 | 1.1100 | 52608272 |
0.1881 | 0.9585 | 975 | 1.1097 | 52870304 |
0.2214 | 0.9634 | 980 | 1.1055 | 53151200 |
0.1554 | 0.9683 | 985 | 1.1063 | 53423592 |
0.1906 | 0.9732 | 990 | 1.1078 | 53699536 |
0.1411 | 0.9781 | 995 | 1.1064 | 53969424 |
0.1967 | 0.9830 | 1000 | 1.1058 | 54241280 |
0.1977 | 0.9880 | 1005 | 1.1067 | 54503696 |
0.1763 | 0.9929 | 1010 | 1.1053 | 54769592 |
0.1614 | 0.9978 | 1015 | 1.1067 | 55039120 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1