collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1141
  • Num Input Tokens Seen: 38535136

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.753 0.0072 5 1.3915 274800
1.645 0.0143 10 1.3471 553336
1.5097 0.0215 15 1.2784 825472
1.4043 0.0286 20 1.2252 1094400
1.2715 0.0358 25 1.1789 1368560
1.2172 0.0430 30 1.1681 1643752
1.1701 0.0501 35 1.1474 1919896
1.012 0.0573 40 1.1634 2202664
0.977 0.0645 45 1.1833 2485728
0.9504 0.0716 50 1.1892 2760656
0.8804 0.0788 55 1.2024 3030296
0.7706 0.0859 60 1.2172 3302344
0.7382 0.0931 65 1.2354 3580728
0.5907 0.1003 70 1.2216 3858040
0.5639 0.1074 75 1.2151 4131752
0.5866 0.1146 80 1.2167 4408168
0.6131 0.1217 85 1.2232 4684048
0.5387 0.1289 90 1.2203 4956816
0.588 0.1361 95 1.2124 5236664
0.5076 0.1432 100 1.2125 5512104
0.4164 0.1504 105 1.2181 5787680
0.4371 0.1576 110 1.2111 6061640
0.4415 0.1647 115 1.2035 6339744
0.4482 0.1719 120 1.2025 6616088
0.4337 0.1790 125 1.2025 6890352
0.4609 0.1862 130 1.1980 7161728
0.3955 0.1934 135 1.2066 7437056
0.4134 0.2005 140 1.1994 7714640
0.2926 0.2077 145 1.2044 7990680
0.5047 0.2148 150 1.1958 8272024
0.3491 0.2220 155 1.2003 8543152
0.3948 0.2292 160 1.1946 8817304
0.4029 0.2363 165 1.2019 9095752
0.2683 0.2435 170 1.1840 9367952
0.3407 0.2506 175 1.1988 9649744
0.3316 0.2578 180 1.1874 9915512
0.4204 0.2650 185 1.1885 10190280
0.2743 0.2721 190 1.1846 10465416
0.2852 0.2793 195 1.1833 10743016
0.3708 0.2865 200 1.1827 11018864
0.2405 0.2936 205 1.1810 11294712
0.3435 0.3008 210 1.1847 11566136
0.277 0.3079 215 1.1775 11839000
0.31 0.3151 220 1.1869 12110104
0.3004 0.3223 225 1.1719 12387072
0.2593 0.3294 230 1.1799 12659864
0.3017 0.3366 235 1.1710 12928592
0.3225 0.3437 240 1.1738 13203112
0.2976 0.3509 245 1.1753 13475880
0.2385 0.3581 250 1.1657 13751768
0.3222 0.3652 255 1.1733 14032088
0.2892 0.3724 260 1.1660 14306696
0.5871 0.3796 265 1.1624 14590560
0.3256 0.3867 270 1.1665 14862432
0.312 0.3939 275 1.1600 15143808
0.317 0.4010 280 1.1618 15415480
0.2964 0.4082 285 1.1640 15694936
0.3226 0.4154 290 1.1586 15974968
0.2756 0.4225 295 1.1595 16255032
0.2167 0.4297 300 1.1596 16539088
0.3576 0.4368 305 1.1566 16819088
0.2757 0.4440 310 1.1541 17100912
0.2413 0.4512 315 1.1550 17373744
0.3459 0.4583 320 1.1483 17647448
0.2882 0.4655 325 1.1493 17922920
0.2383 0.4727 330 1.1471 18194680
0.2872 0.4798 335 1.1510 18471192
0.2302 0.4870 340 1.1474 18747848
0.285 0.4941 345 1.1484 19026688
0.2765 0.5013 350 1.1456 19293616
0.1756 0.5085 355 1.1435 19570744
0.303 0.5156 360 1.1457 19845048
0.2726 0.5228 365 1.1422 20115096
0.2625 0.5299 370 1.1423 20395336
0.2419 0.5371 375 1.1430 20667208
0.1856 0.5443 380 1.1388 20948560
0.3427 0.5514 385 1.1400 21218968
0.2147 0.5586 390 1.1354 21489088
0.2514 0.5658 395 1.1387 21764248
0.293 0.5729 400 1.1345 22038944
0.2699 0.5801 405 1.1349 22312360
0.2219 0.5872 410 1.1353 22589016
0.3573 0.5944 415 1.1305 22864576
0.343 0.6016 420 1.1355 23144760
0.2924 0.6087 425 1.1347 23421952
0.2846 0.6159 430 1.1293 23700352
0.2971 0.6230 435 1.1328 23983624
0.2037 0.6302 440 1.1312 24263512
0.29 0.6374 445 1.1309 24530624
0.2089 0.6445 450 1.1317 24800848
0.2477 0.6517 455 1.1318 25080464
0.2275 0.6588 460 1.1265 25356832
0.2335 0.6660 465 1.1285 25638344
0.1839 0.6732 470 1.1326 25912488
0.2514 0.6803 475 1.1276 26189888
0.3751 0.6875 480 1.1271 26472040
0.2701 0.6947 485 1.1260 26753624
0.2235 0.7018 490 1.1254 27029592
0.244 0.7090 495 1.1246 27311520
0.2294 0.7161 500 1.1231 27586432
0.2949 0.7233 505 1.1247 27860176
0.1593 0.7305 510 1.1254 28137160
0.2553 0.7376 515 1.1257 28418864
0.1885 0.7448 520 1.1249 28696856
0.2695 0.7519 525 1.1251 28975192
0.2545 0.7591 530 1.1214 29251760
0.2446 0.7663 535 1.1211 29528808
0.3202 0.7734 540 1.1233 29803128
0.2623 0.7806 545 1.1200 30079416
0.2142 0.7878 550 1.1205 30352064
0.2502 0.7949 555 1.1210 30629824
0.3042 0.8021 560 1.1180 30904272
0.197 0.8092 565 1.1196 31174976
0.2593 0.8164 570 1.1191 31446624
0.3324 0.8236 575 1.1183 31729592
0.2113 0.8307 580 1.1203 32004000
0.2764 0.8379 585 1.1196 32277080
0.2863 0.8450 590 1.1166 32551352
0.1917 0.8522 595 1.1213 32831496
0.1784 0.8594 600 1.1194 33113448
0.2198 0.8665 605 1.1173 33387680
0.3067 0.8737 610 1.1185 33664656
0.2372 0.8809 615 1.1154 33938472
0.2207 0.8880 620 1.1172 34216144
0.2026 0.8952 625 1.1177 34487704
0.2003 0.9023 630 1.1144 34767944
0.2438 0.9095 635 1.1178 35042160
0.3055 0.9167 640 1.1154 35322704
0.2598 0.9238 645 1.1137 35599184
0.2283 0.9310 650 1.1163 35874368
0.2463 0.9381 655 1.1152 36142120
0.2388 0.9453 660 1.1133 36411336
0.2284 0.9525 665 1.1161 36697696
0.2146 0.9596 670 1.1133 36973112
0.2494 0.9668 675 1.1151 37252568
0.2118 0.9740 680 1.1151 37528656
0.2539 0.9811 685 1.1131 37804520
0.2345 0.9883 690 1.1137 38078360
0.2216 0.9954 695 1.1142 38361640

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
11
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd0

Base model

google/gemma-2-2b
Finetuned
(470)
this model