collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd0
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1141
- Num Input Tokens Seen: 38535136
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3956 | 0 |
1.753 | 0.0072 | 5 | 1.3915 | 274800 |
1.645 | 0.0143 | 10 | 1.3471 | 553336 |
1.5097 | 0.0215 | 15 | 1.2784 | 825472 |
1.4043 | 0.0286 | 20 | 1.2252 | 1094400 |
1.2715 | 0.0358 | 25 | 1.1789 | 1368560 |
1.2172 | 0.0430 | 30 | 1.1681 | 1643752 |
1.1701 | 0.0501 | 35 | 1.1474 | 1919896 |
1.012 | 0.0573 | 40 | 1.1634 | 2202664 |
0.977 | 0.0645 | 45 | 1.1833 | 2485728 |
0.9504 | 0.0716 | 50 | 1.1892 | 2760656 |
0.8804 | 0.0788 | 55 | 1.2024 | 3030296 |
0.7706 | 0.0859 | 60 | 1.2172 | 3302344 |
0.7382 | 0.0931 | 65 | 1.2354 | 3580728 |
0.5907 | 0.1003 | 70 | 1.2216 | 3858040 |
0.5639 | 0.1074 | 75 | 1.2151 | 4131752 |
0.5866 | 0.1146 | 80 | 1.2167 | 4408168 |
0.6131 | 0.1217 | 85 | 1.2232 | 4684048 |
0.5387 | 0.1289 | 90 | 1.2203 | 4956816 |
0.588 | 0.1361 | 95 | 1.2124 | 5236664 |
0.5076 | 0.1432 | 100 | 1.2125 | 5512104 |
0.4164 | 0.1504 | 105 | 1.2181 | 5787680 |
0.4371 | 0.1576 | 110 | 1.2111 | 6061640 |
0.4415 | 0.1647 | 115 | 1.2035 | 6339744 |
0.4482 | 0.1719 | 120 | 1.2025 | 6616088 |
0.4337 | 0.1790 | 125 | 1.2025 | 6890352 |
0.4609 | 0.1862 | 130 | 1.1980 | 7161728 |
0.3955 | 0.1934 | 135 | 1.2066 | 7437056 |
0.4134 | 0.2005 | 140 | 1.1994 | 7714640 |
0.2926 | 0.2077 | 145 | 1.2044 | 7990680 |
0.5047 | 0.2148 | 150 | 1.1958 | 8272024 |
0.3491 | 0.2220 | 155 | 1.2003 | 8543152 |
0.3948 | 0.2292 | 160 | 1.1946 | 8817304 |
0.4029 | 0.2363 | 165 | 1.2019 | 9095752 |
0.2683 | 0.2435 | 170 | 1.1840 | 9367952 |
0.3407 | 0.2506 | 175 | 1.1988 | 9649744 |
0.3316 | 0.2578 | 180 | 1.1874 | 9915512 |
0.4204 | 0.2650 | 185 | 1.1885 | 10190280 |
0.2743 | 0.2721 | 190 | 1.1846 | 10465416 |
0.2852 | 0.2793 | 195 | 1.1833 | 10743016 |
0.3708 | 0.2865 | 200 | 1.1827 | 11018864 |
0.2405 | 0.2936 | 205 | 1.1810 | 11294712 |
0.3435 | 0.3008 | 210 | 1.1847 | 11566136 |
0.277 | 0.3079 | 215 | 1.1775 | 11839000 |
0.31 | 0.3151 | 220 | 1.1869 | 12110104 |
0.3004 | 0.3223 | 225 | 1.1719 | 12387072 |
0.2593 | 0.3294 | 230 | 1.1799 | 12659864 |
0.3017 | 0.3366 | 235 | 1.1710 | 12928592 |
0.3225 | 0.3437 | 240 | 1.1738 | 13203112 |
0.2976 | 0.3509 | 245 | 1.1753 | 13475880 |
0.2385 | 0.3581 | 250 | 1.1657 | 13751768 |
0.3222 | 0.3652 | 255 | 1.1733 | 14032088 |
0.2892 | 0.3724 | 260 | 1.1660 | 14306696 |
0.5871 | 0.3796 | 265 | 1.1624 | 14590560 |
0.3256 | 0.3867 | 270 | 1.1665 | 14862432 |
0.312 | 0.3939 | 275 | 1.1600 | 15143808 |
0.317 | 0.4010 | 280 | 1.1618 | 15415480 |
0.2964 | 0.4082 | 285 | 1.1640 | 15694936 |
0.3226 | 0.4154 | 290 | 1.1586 | 15974968 |
0.2756 | 0.4225 | 295 | 1.1595 | 16255032 |
0.2167 | 0.4297 | 300 | 1.1596 | 16539088 |
0.3576 | 0.4368 | 305 | 1.1566 | 16819088 |
0.2757 | 0.4440 | 310 | 1.1541 | 17100912 |
0.2413 | 0.4512 | 315 | 1.1550 | 17373744 |
0.3459 | 0.4583 | 320 | 1.1483 | 17647448 |
0.2882 | 0.4655 | 325 | 1.1493 | 17922920 |
0.2383 | 0.4727 | 330 | 1.1471 | 18194680 |
0.2872 | 0.4798 | 335 | 1.1510 | 18471192 |
0.2302 | 0.4870 | 340 | 1.1474 | 18747848 |
0.285 | 0.4941 | 345 | 1.1484 | 19026688 |
0.2765 | 0.5013 | 350 | 1.1456 | 19293616 |
0.1756 | 0.5085 | 355 | 1.1435 | 19570744 |
0.303 | 0.5156 | 360 | 1.1457 | 19845048 |
0.2726 | 0.5228 | 365 | 1.1422 | 20115096 |
0.2625 | 0.5299 | 370 | 1.1423 | 20395336 |
0.2419 | 0.5371 | 375 | 1.1430 | 20667208 |
0.1856 | 0.5443 | 380 | 1.1388 | 20948560 |
0.3427 | 0.5514 | 385 | 1.1400 | 21218968 |
0.2147 | 0.5586 | 390 | 1.1354 | 21489088 |
0.2514 | 0.5658 | 395 | 1.1387 | 21764248 |
0.293 | 0.5729 | 400 | 1.1345 | 22038944 |
0.2699 | 0.5801 | 405 | 1.1349 | 22312360 |
0.2219 | 0.5872 | 410 | 1.1353 | 22589016 |
0.3573 | 0.5944 | 415 | 1.1305 | 22864576 |
0.343 | 0.6016 | 420 | 1.1355 | 23144760 |
0.2924 | 0.6087 | 425 | 1.1347 | 23421952 |
0.2846 | 0.6159 | 430 | 1.1293 | 23700352 |
0.2971 | 0.6230 | 435 | 1.1328 | 23983624 |
0.2037 | 0.6302 | 440 | 1.1312 | 24263512 |
0.29 | 0.6374 | 445 | 1.1309 | 24530624 |
0.2089 | 0.6445 | 450 | 1.1317 | 24800848 |
0.2477 | 0.6517 | 455 | 1.1318 | 25080464 |
0.2275 | 0.6588 | 460 | 1.1265 | 25356832 |
0.2335 | 0.6660 | 465 | 1.1285 | 25638344 |
0.1839 | 0.6732 | 470 | 1.1326 | 25912488 |
0.2514 | 0.6803 | 475 | 1.1276 | 26189888 |
0.3751 | 0.6875 | 480 | 1.1271 | 26472040 |
0.2701 | 0.6947 | 485 | 1.1260 | 26753624 |
0.2235 | 0.7018 | 490 | 1.1254 | 27029592 |
0.244 | 0.7090 | 495 | 1.1246 | 27311520 |
0.2294 | 0.7161 | 500 | 1.1231 | 27586432 |
0.2949 | 0.7233 | 505 | 1.1247 | 27860176 |
0.1593 | 0.7305 | 510 | 1.1254 | 28137160 |
0.2553 | 0.7376 | 515 | 1.1257 | 28418864 |
0.1885 | 0.7448 | 520 | 1.1249 | 28696856 |
0.2695 | 0.7519 | 525 | 1.1251 | 28975192 |
0.2545 | 0.7591 | 530 | 1.1214 | 29251760 |
0.2446 | 0.7663 | 535 | 1.1211 | 29528808 |
0.3202 | 0.7734 | 540 | 1.1233 | 29803128 |
0.2623 | 0.7806 | 545 | 1.1200 | 30079416 |
0.2142 | 0.7878 | 550 | 1.1205 | 30352064 |
0.2502 | 0.7949 | 555 | 1.1210 | 30629824 |
0.3042 | 0.8021 | 560 | 1.1180 | 30904272 |
0.197 | 0.8092 | 565 | 1.1196 | 31174976 |
0.2593 | 0.8164 | 570 | 1.1191 | 31446624 |
0.3324 | 0.8236 | 575 | 1.1183 | 31729592 |
0.2113 | 0.8307 | 580 | 1.1203 | 32004000 |
0.2764 | 0.8379 | 585 | 1.1196 | 32277080 |
0.2863 | 0.8450 | 590 | 1.1166 | 32551352 |
0.1917 | 0.8522 | 595 | 1.1213 | 32831496 |
0.1784 | 0.8594 | 600 | 1.1194 | 33113448 |
0.2198 | 0.8665 | 605 | 1.1173 | 33387680 |
0.3067 | 0.8737 | 610 | 1.1185 | 33664656 |
0.2372 | 0.8809 | 615 | 1.1154 | 33938472 |
0.2207 | 0.8880 | 620 | 1.1172 | 34216144 |
0.2026 | 0.8952 | 625 | 1.1177 | 34487704 |
0.2003 | 0.9023 | 630 | 1.1144 | 34767944 |
0.2438 | 0.9095 | 635 | 1.1178 | 35042160 |
0.3055 | 0.9167 | 640 | 1.1154 | 35322704 |
0.2598 | 0.9238 | 645 | 1.1137 | 35599184 |
0.2283 | 0.9310 | 650 | 1.1163 | 35874368 |
0.2463 | 0.9381 | 655 | 1.1152 | 36142120 |
0.2388 | 0.9453 | 660 | 1.1133 | 36411336 |
0.2284 | 0.9525 | 665 | 1.1161 | 36697696 |
0.2146 | 0.9596 | 670 | 1.1133 | 36973112 |
0.2494 | 0.9668 | 675 | 1.1151 | 37252568 |
0.2118 | 0.9740 | 680 | 1.1151 | 37528656 |
0.2539 | 0.9811 | 685 | 1.1131 | 37804520 |
0.2345 | 0.9883 | 690 | 1.1137 | 38078360 |
0.2216 | 0.9954 | 695 | 1.1142 | 38361640 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 11
Model tree for jkazdan/collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd0
Base model
google/gemma-2-2b