collapse_gemma-2-2b_hs2_accumulate_iter16_sftsd2
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1055
- Num Input Tokens Seen: 82844176
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.5039 | 0.0033 | 5 | 1.3906 | 271576 |
1.4852 | 0.0065 | 10 | 1.3816 | 537968 |
1.5977 | 0.0098 | 15 | 1.3530 | 800136 |
1.5362 | 0.0130 | 20 | 1.3155 | 1067920 |
1.4509 | 0.0163 | 25 | 1.2703 | 1339288 |
1.3391 | 0.0195 | 30 | 1.2378 | 1612984 |
1.3445 | 0.0228 | 35 | 1.2086 | 1886640 |
1.1707 | 0.0261 | 40 | 1.1952 | 2160616 |
1.0753 | 0.0293 | 45 | 1.2140 | 2425456 |
0.9626 | 0.0326 | 50 | 1.2235 | 2691704 |
0.8254 | 0.0358 | 55 | 1.2338 | 2958568 |
0.8105 | 0.0391 | 60 | 1.3020 | 3231104 |
0.6987 | 0.0423 | 65 | 1.2881 | 3500656 |
0.5053 | 0.0456 | 70 | 1.2994 | 3772760 |
0.5168 | 0.0488 | 75 | 1.2769 | 4045296 |
0.3788 | 0.0521 | 80 | 1.2789 | 4315832 |
0.3937 | 0.0554 | 85 | 1.2555 | 4587440 |
0.3916 | 0.0586 | 90 | 1.2383 | 4853840 |
0.3171 | 0.0619 | 95 | 1.2444 | 5127152 |
0.3361 | 0.0651 | 100 | 1.2270 | 5390624 |
0.2571 | 0.0684 | 105 | 1.2449 | 5656176 |
0.3089 | 0.0716 | 110 | 1.2219 | 5929696 |
0.2787 | 0.0749 | 115 | 1.2228 | 6196144 |
0.2707 | 0.0782 | 120 | 1.2161 | 6469696 |
0.3436 | 0.0814 | 125 | 1.2292 | 6745984 |
0.2178 | 0.0847 | 130 | 1.2049 | 7007096 |
0.3035 | 0.0879 | 135 | 1.2150 | 7271696 |
0.1814 | 0.0912 | 140 | 1.2019 | 7540048 |
0.2391 | 0.0944 | 145 | 1.1975 | 7812248 |
0.2421 | 0.0977 | 150 | 1.2095 | 8080240 |
0.2218 | 0.1010 | 155 | 1.1945 | 8354432 |
0.2586 | 0.1042 | 160 | 1.2109 | 8625504 |
0.2395 | 0.1075 | 165 | 1.1940 | 8903896 |
0.1725 | 0.1107 | 170 | 1.2083 | 9173552 |
0.206 | 0.1140 | 175 | 1.2034 | 9445520 |
0.2287 | 0.1172 | 180 | 1.1909 | 9713304 |
0.2314 | 0.1205 | 185 | 1.1898 | 9981712 |
0.1459 | 0.1238 | 190 | 1.1877 | 10258280 |
0.1671 | 0.1270 | 195 | 1.1877 | 10526920 |
0.1922 | 0.1303 | 200 | 1.1931 | 10801744 |
0.2319 | 0.1335 | 205 | 1.1904 | 11074120 |
0.2302 | 0.1368 | 210 | 1.1907 | 11343384 |
0.1056 | 0.1400 | 215 | 1.1908 | 11613728 |
0.2058 | 0.1433 | 220 | 1.1875 | 11884712 |
0.1874 | 0.1465 | 225 | 1.1953 | 12155176 |
0.1981 | 0.1498 | 230 | 1.1887 | 12422120 |
0.177 | 0.1531 | 235 | 1.1822 | 12694568 |
0.1469 | 0.1563 | 240 | 1.1815 | 12966936 |
0.2289 | 0.1596 | 245 | 1.1804 | 13244512 |
0.1615 | 0.1628 | 250 | 1.1781 | 13516568 |
0.2107 | 0.1661 | 255 | 1.1767 | 13779896 |
0.1211 | 0.1693 | 260 | 1.1841 | 14050808 |
0.1391 | 0.1726 | 265 | 1.1759 | 14318640 |
0.1841 | 0.1759 | 270 | 1.1759 | 14594040 |
0.1822 | 0.1791 | 275 | 1.1789 | 14868088 |
0.1746 | 0.1824 | 280 | 1.1716 | 15132064 |
0.1149 | 0.1856 | 285 | 1.1725 | 15405848 |
0.1279 | 0.1889 | 290 | 1.1728 | 15673248 |
0.1787 | 0.1921 | 295 | 1.1696 | 15941904 |
0.1428 | 0.1954 | 300 | 1.1722 | 16212936 |
0.22 | 0.1987 | 305 | 1.1722 | 16483744 |
0.1367 | 0.2019 | 310 | 1.1716 | 16753392 |
0.1737 | 0.2052 | 315 | 1.1706 | 17029896 |
0.1607 | 0.2084 | 320 | 1.1715 | 17293448 |
0.192 | 0.2117 | 325 | 1.1717 | 17564144 |
0.1975 | 0.2149 | 330 | 1.1691 | 17833584 |
0.1442 | 0.2182 | 335 | 1.1642 | 18107792 |
0.1361 | 0.2215 | 340 | 1.1696 | 18374832 |
0.1479 | 0.2247 | 345 | 1.1636 | 18641920 |
0.1464 | 0.2280 | 350 | 1.1675 | 18925344 |
0.2099 | 0.2312 | 355 | 1.1640 | 19201384 |
0.1536 | 0.2345 | 360 | 1.1576 | 19474456 |
0.2109 | 0.2377 | 365 | 1.1646 | 19743872 |
0.1265 | 0.2410 | 370 | 1.1658 | 20016224 |
0.132 | 0.2442 | 375 | 1.1634 | 20290240 |
0.1852 | 0.2475 | 380 | 1.1631 | 20563032 |
0.1056 | 0.2508 | 385 | 1.1602 | 20830640 |
0.2433 | 0.2540 | 390 | 1.1652 | 21102224 |
0.2575 | 0.2573 | 395 | 1.1583 | 21379904 |
0.1805 | 0.2605 | 400 | 1.1559 | 21654064 |
0.1782 | 0.2638 | 405 | 1.1649 | 21923640 |
0.1211 | 0.2670 | 410 | 1.1593 | 22191040 |
0.1352 | 0.2703 | 415 | 1.1562 | 22455592 |
0.1907 | 0.2736 | 420 | 1.1596 | 22722304 |
0.1761 | 0.2768 | 425 | 1.1573 | 22998792 |
0.1979 | 0.2801 | 430 | 1.1541 | 23268480 |
0.0986 | 0.2833 | 435 | 1.1607 | 23539400 |
0.195 | 0.2866 | 440 | 1.1627 | 23812648 |
0.1764 | 0.2898 | 445 | 1.1522 | 24085808 |
0.1496 | 0.2931 | 450 | 1.1541 | 24358128 |
0.1629 | 0.2964 | 455 | 1.1538 | 24628784 |
0.0963 | 0.2996 | 460 | 1.1526 | 24897280 |
0.0816 | 0.3029 | 465 | 1.1559 | 25168688 |
0.1701 | 0.3061 | 470 | 1.1501 | 25436904 |
0.1321 | 0.3094 | 475 | 1.1507 | 25708072 |
0.1059 | 0.3126 | 480 | 1.1528 | 25981288 |
0.1384 | 0.3159 | 485 | 1.1497 | 26251216 |
0.1667 | 0.3192 | 490 | 1.1488 | 26518096 |
0.2286 | 0.3224 | 495 | 1.1508 | 26795336 |
0.1276 | 0.3257 | 500 | 1.1484 | 27064584 |
0.1776 | 0.3289 | 505 | 1.1468 | 27332216 |
0.1676 | 0.3322 | 510 | 1.1482 | 27603744 |
0.1045 | 0.3354 | 515 | 1.1450 | 27873608 |
0.1412 | 0.3387 | 520 | 1.1509 | 28144784 |
0.0865 | 0.3419 | 525 | 1.1478 | 28413792 |
0.2049 | 0.3452 | 530 | 1.1454 | 28686048 |
0.1518 | 0.3485 | 535 | 1.1465 | 28955632 |
0.1581 | 0.3517 | 540 | 1.1468 | 29227088 |
0.1445 | 0.3550 | 545 | 1.1449 | 29500960 |
0.1564 | 0.3582 | 550 | 1.1440 | 29764408 |
0.1469 | 0.3615 | 555 | 1.1424 | 30034312 |
0.1529 | 0.3647 | 560 | 1.1434 | 30302376 |
0.1456 | 0.3680 | 565 | 1.1438 | 30573448 |
0.1243 | 0.3713 | 570 | 1.1430 | 30844112 |
0.1677 | 0.3745 | 575 | 1.1447 | 31116776 |
0.1354 | 0.3778 | 580 | 1.1444 | 31390384 |
0.1434 | 0.3810 | 585 | 1.1425 | 31659840 |
0.1431 | 0.3843 | 590 | 1.1437 | 31925880 |
0.1575 | 0.3875 | 595 | 1.1403 | 32199744 |
0.1599 | 0.3908 | 600 | 1.1370 | 32471304 |
0.181 | 0.3941 | 605 | 1.1409 | 32738352 |
0.1455 | 0.3973 | 610 | 1.1415 | 33006672 |
0.1494 | 0.4006 | 615 | 1.1376 | 33274824 |
0.1423 | 0.4038 | 620 | 1.1376 | 33547880 |
0.102 | 0.4071 | 625 | 1.1409 | 33824560 |
0.1801 | 0.4103 | 630 | 1.1398 | 34097432 |
0.0983 | 0.4136 | 635 | 1.1374 | 34358120 |
0.1546 | 0.4169 | 640 | 1.1350 | 34629408 |
0.1473 | 0.4201 | 645 | 1.1372 | 34898312 |
0.1139 | 0.4234 | 650 | 1.1362 | 35169912 |
0.1364 | 0.4266 | 655 | 1.1333 | 35436032 |
0.1327 | 0.4299 | 660 | 1.1351 | 35704040 |
0.1341 | 0.4331 | 665 | 1.1353 | 35972216 |
0.1479 | 0.4364 | 670 | 1.1315 | 36244424 |
0.0799 | 0.4396 | 675 | 1.1311 | 36512512 |
0.2117 | 0.4429 | 680 | 1.1323 | 36784784 |
0.2059 | 0.4462 | 685 | 1.1307 | 37054272 |
0.1296 | 0.4494 | 690 | 1.1331 | 37328568 |
0.0917 | 0.4527 | 695 | 1.1335 | 37594056 |
0.1273 | 0.4559 | 700 | 1.1304 | 37866304 |
0.1186 | 0.4592 | 705 | 1.1330 | 38127328 |
0.1647 | 0.4624 | 710 | 1.1345 | 38393152 |
0.133 | 0.4657 | 715 | 1.1322 | 38664872 |
0.0875 | 0.4690 | 720 | 1.1322 | 38935120 |
0.1473 | 0.4722 | 725 | 1.1311 | 39200424 |
0.149 | 0.4755 | 730 | 1.1320 | 39464680 |
0.1396 | 0.4787 | 735 | 1.1317 | 39737224 |
0.1367 | 0.4820 | 740 | 1.1319 | 40002960 |
0.1049 | 0.4852 | 745 | 1.1308 | 40272456 |
0.1038 | 0.4885 | 750 | 1.1299 | 40542744 |
0.159 | 0.4918 | 755 | 1.1289 | 40811320 |
0.1561 | 0.4950 | 760 | 1.1273 | 41075264 |
0.1786 | 0.4983 | 765 | 1.1291 | 41345352 |
0.118 | 0.5015 | 770 | 1.1285 | 41611400 |
0.1543 | 0.5048 | 775 | 1.1290 | 41883184 |
0.1174 | 0.5080 | 780 | 1.1288 | 42146144 |
0.1245 | 0.5113 | 785 | 1.1298 | 42424208 |
0.2613 | 0.5146 | 790 | 1.1293 | 42694728 |
0.1706 | 0.5178 | 795 | 1.1286 | 42960752 |
0.1646 | 0.5211 | 800 | 1.1283 | 43227256 |
0.1424 | 0.5243 | 805 | 1.1264 | 43495512 |
0.0918 | 0.5276 | 810 | 1.1272 | 43769712 |
0.1526 | 0.5308 | 815 | 1.1280 | 44040080 |
0.1386 | 0.5341 | 820 | 1.1253 | 44310536 |
0.146 | 0.5373 | 825 | 1.1254 | 44581696 |
0.148 | 0.5406 | 830 | 1.1312 | 44852520 |
0.1081 | 0.5439 | 835 | 1.1289 | 45124704 |
0.1354 | 0.5471 | 840 | 1.1248 | 45393248 |
0.1265 | 0.5504 | 845 | 1.1241 | 45661584 |
0.0968 | 0.5536 | 850 | 1.1261 | 45926112 |
0.0831 | 0.5569 | 855 | 1.1257 | 46191464 |
0.1226 | 0.5601 | 860 | 1.1250 | 46463432 |
0.175 | 0.5634 | 865 | 1.1265 | 46735144 |
0.0895 | 0.5667 | 870 | 1.1264 | 47008848 |
0.1311 | 0.5699 | 875 | 1.1261 | 47275752 |
0.1534 | 0.5732 | 880 | 1.1260 | 47547536 |
0.0792 | 0.5764 | 885 | 1.1256 | 47807952 |
0.1102 | 0.5797 | 890 | 1.1241 | 48079856 |
0.1803 | 0.5829 | 895 | 1.1238 | 48351768 |
0.1417 | 0.5862 | 900 | 1.1250 | 48618296 |
0.1305 | 0.5895 | 905 | 1.1246 | 48889440 |
0.136 | 0.5927 | 910 | 1.1238 | 49159872 |
0.1173 | 0.5960 | 915 | 1.1261 | 49434008 |
0.1585 | 0.5992 | 920 | 1.1263 | 49702776 |
0.1697 | 0.6025 | 925 | 1.1238 | 49976720 |
0.1248 | 0.6057 | 930 | 1.1236 | 50238008 |
0.1657 | 0.6090 | 935 | 1.1241 | 50513096 |
0.1185 | 0.6123 | 940 | 1.1247 | 50787640 |
0.0992 | 0.6155 | 945 | 1.1253 | 51056720 |
0.1016 | 0.6188 | 950 | 1.1256 | 51320744 |
0.1658 | 0.6220 | 955 | 1.1233 | 51588752 |
0.1424 | 0.6253 | 960 | 1.1245 | 51858352 |
0.1073 | 0.6285 | 965 | 1.1293 | 52126472 |
0.0753 | 0.6318 | 970 | 1.1272 | 52389344 |
0.0953 | 0.6350 | 975 | 1.1246 | 52648584 |
0.1894 | 0.6383 | 980 | 1.1217 | 52918720 |
0.1428 | 0.6416 | 985 | 1.1245 | 53195672 |
0.1028 | 0.6448 | 990 | 1.1249 | 53464912 |
0.0853 | 0.6481 | 995 | 1.1217 | 53731792 |
0.0901 | 0.6513 | 1000 | 1.1240 | 54008776 |
0.1044 | 0.6546 | 1005 | 1.1245 | 54277584 |
0.1283 | 0.6578 | 1010 | 1.1195 | 54551304 |
0.1889 | 0.6611 | 1015 | 1.1184 | 54818424 |
0.1217 | 0.6644 | 1020 | 1.1214 | 55090488 |
0.1642 | 0.6676 | 1025 | 1.1224 | 55348648 |
0.2113 | 0.6709 | 1030 | 1.1203 | 55611976 |
0.0825 | 0.6741 | 1035 | 1.1185 | 55883144 |
0.1733 | 0.6774 | 1040 | 1.1184 | 56156696 |
0.0847 | 0.6806 | 1045 | 1.1202 | 56428696 |
0.0893 | 0.6839 | 1050 | 1.1201 | 56697392 |
0.1272 | 0.6872 | 1055 | 1.1187 | 56962520 |
0.1456 | 0.6904 | 1060 | 1.1200 | 57232200 |
0.1687 | 0.6937 | 1065 | 1.1202 | 57497960 |
0.1232 | 0.6969 | 1070 | 1.1198 | 57766184 |
0.1383 | 0.7002 | 1075 | 1.1216 | 58039696 |
0.145 | 0.7034 | 1080 | 1.1196 | 58306936 |
0.1375 | 0.7067 | 1085 | 1.1175 | 58575888 |
0.1091 | 0.7100 | 1090 | 1.1180 | 58845608 |
0.1326 | 0.7132 | 1095 | 1.1179 | 59114576 |
0.1042 | 0.7165 | 1100 | 1.1172 | 59386824 |
0.1253 | 0.7197 | 1105 | 1.1185 | 59655016 |
0.1596 | 0.7230 | 1110 | 1.1187 | 59924592 |
0.1593 | 0.7262 | 1115 | 1.1168 | 60197040 |
0.1483 | 0.7295 | 1120 | 1.1163 | 60460016 |
0.1616 | 0.7327 | 1125 | 1.1165 | 60733576 |
0.1024 | 0.7360 | 1130 | 1.1181 | 61001904 |
0.1268 | 0.7393 | 1135 | 1.1183 | 61264720 |
0.1674 | 0.7425 | 1140 | 1.1145 | 61535376 |
0.1567 | 0.7458 | 1145 | 1.1161 | 61803048 |
0.1463 | 0.7490 | 1150 | 1.1183 | 62078328 |
0.1363 | 0.7523 | 1155 | 1.1144 | 62351936 |
0.1519 | 0.7555 | 1160 | 1.1122 | 62626688 |
0.1373 | 0.7588 | 1165 | 1.1157 | 62897048 |
0.0861 | 0.7621 | 1170 | 1.1157 | 63160696 |
0.1268 | 0.7653 | 1175 | 1.1144 | 63425200 |
0.165 | 0.7686 | 1180 | 1.1150 | 63698808 |
0.1442 | 0.7718 | 1185 | 1.1143 | 63966592 |
0.1254 | 0.7751 | 1190 | 1.1145 | 64237344 |
0.1378 | 0.7783 | 1195 | 1.1138 | 64505872 |
0.1167 | 0.7816 | 1200 | 1.1126 | 64775240 |
0.1256 | 0.7849 | 1205 | 1.1118 | 65047672 |
0.1216 | 0.7881 | 1210 | 1.1150 | 65314552 |
0.1618 | 0.7914 | 1215 | 1.1142 | 65580880 |
0.1306 | 0.7946 | 1220 | 1.1133 | 65850728 |
0.1237 | 0.7979 | 1225 | 1.1143 | 66116744 |
0.1197 | 0.8011 | 1230 | 1.1145 | 66390776 |
0.1309 | 0.8044 | 1235 | 1.1134 | 66658840 |
0.1303 | 0.8077 | 1240 | 1.1118 | 66926488 |
0.1008 | 0.8109 | 1245 | 1.1123 | 67196240 |
0.12 | 0.8142 | 1250 | 1.1133 | 67459608 |
0.1477 | 0.8174 | 1255 | 1.1134 | 67724496 |
0.083 | 0.8207 | 1260 | 1.1128 | 67986464 |
0.1136 | 0.8239 | 1265 | 1.1123 | 68251728 |
0.1037 | 0.8272 | 1270 | 1.1147 | 68520312 |
0.067 | 0.8304 | 1275 | 1.1153 | 68789840 |
0.1221 | 0.8337 | 1280 | 1.1132 | 69062696 |
0.1594 | 0.8370 | 1285 | 1.1111 | 69331240 |
0.119 | 0.8402 | 1290 | 1.1107 | 69602248 |
0.192 | 0.8435 | 1295 | 1.1132 | 69872224 |
0.1019 | 0.8467 | 1300 | 1.1133 | 70142400 |
0.1292 | 0.8500 | 1305 | 1.1136 | 70414752 |
0.091 | 0.8532 | 1310 | 1.1133 | 70683520 |
0.1112 | 0.8565 | 1315 | 1.1122 | 70953392 |
0.109 | 0.8598 | 1320 | 1.1137 | 71221496 |
0.1646 | 0.8630 | 1325 | 1.1131 | 71490000 |
0.1368 | 0.8663 | 1330 | 1.1103 | 71772112 |
0.1456 | 0.8695 | 1335 | 1.1095 | 72039336 |
0.0882 | 0.8728 | 1340 | 1.1121 | 72307568 |
0.101 | 0.8760 | 1345 | 1.1140 | 72578576 |
0.1664 | 0.8793 | 1350 | 1.1130 | 72842848 |
0.1625 | 0.8826 | 1355 | 1.1103 | 73112816 |
0.1215 | 0.8858 | 1360 | 1.1089 | 73382688 |
0.1231 | 0.8891 | 1365 | 1.1116 | 73655776 |
0.1509 | 0.8923 | 1370 | 1.1127 | 73922888 |
0.1355 | 0.8956 | 1375 | 1.1110 | 74195960 |
0.121 | 0.8988 | 1380 | 1.1104 | 74474448 |
0.138 | 0.9021 | 1385 | 1.1107 | 74744408 |
0.1036 | 0.9054 | 1390 | 1.1105 | 75015752 |
0.1379 | 0.9086 | 1395 | 1.1119 | 75285272 |
0.1468 | 0.9119 | 1400 | 1.1109 | 75552680 |
0.1615 | 0.9151 | 1405 | 1.1084 | 75821776 |
0.1259 | 0.9184 | 1410 | 1.1112 | 76086160 |
0.2046 | 0.9216 | 1415 | 1.1106 | 76361240 |
0.1447 | 0.9249 | 1420 | 1.1093 | 76633128 |
0.1309 | 0.9281 | 1425 | 1.1113 | 76903696 |
0.1318 | 0.9314 | 1430 | 1.1122 | 77172776 |
0.1511 | 0.9347 | 1435 | 1.1093 | 77443184 |
0.1688 | 0.9379 | 1440 | 1.1101 | 77714232 |
0.0873 | 0.9412 | 1445 | 1.1090 | 77986504 |
0.1139 | 0.9444 | 1450 | 1.1109 | 78253336 |
0.1257 | 0.9477 | 1455 | 1.1112 | 78522536 |
0.1327 | 0.9509 | 1460 | 1.1085 | 78795232 |
0.1288 | 0.9542 | 1465 | 1.1078 | 79066472 |
0.1055 | 0.9575 | 1470 | 1.1088 | 79336072 |
0.1131 | 0.9607 | 1475 | 1.1109 | 79605912 |
0.0975 | 0.9640 | 1480 | 1.1097 | 79871840 |
0.0986 | 0.9672 | 1485 | 1.1070 | 80141136 |
0.189 | 0.9705 | 1490 | 1.1055 | 80407456 |
0.2012 | 0.9737 | 1495 | 1.1060 | 80677600 |
0.1557 | 0.9770 | 1500 | 1.1086 | 80950760 |
0.1133 | 0.9803 | 1505 | 1.1071 | 81215384 |
0.1046 | 0.9835 | 1510 | 1.1055 | 81485568 |
0.1501 | 0.9868 | 1515 | 1.1053 | 81758360 |
0.1452 | 0.9900 | 1520 | 1.1082 | 82028512 |
0.1803 | 0.9933 | 1525 | 1.1095 | 82300688 |
0.1056 | 0.9965 | 1530 | 1.1078 | 82570576 |
0.1258 | 0.9998 | 1535 | 1.1055 | 82844176 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 8
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter16_sftsd2
Base model
google/gemma-2-2b