collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1022
  • Num Input Tokens Seen: 22054048

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.628 0.0127 5 1.3800 282000
1.6129 0.0254 10 1.2915 565768
1.4755 0.0381 15 1.2119 845776
1.2663 0.0508 20 1.1654 1119976
1.2503 0.0636 25 1.1530 1405752
1.0375 0.0763 30 1.1358 1683544
0.9388 0.0890 35 1.1575 1962800
0.8887 0.1017 40 1.1613 2242448
0.9444 0.1144 45 1.1814 2530200
0.8274 0.1271 50 1.1685 2813744
0.7725 0.1398 55 1.1846 3088392
0.7435 0.1525 60 1.1750 3367968
0.8112 0.1652 65 1.1798 3653616
0.6116 0.1779 70 1.1803 3935936
0.6364 0.1907 75 1.1648 4215056
0.6888 0.2034 80 1.1682 4498800
0.6489 0.2161 85 1.1755 4777456
0.5009 0.2288 90 1.1711 5056576
0.6014 0.2415 95 1.1619 5333256
0.6265 0.2542 100 1.1702 5607960
0.4422 0.2669 105 1.1616 5888544
0.5504 0.2796 110 1.1721 6157688
0.5325 0.2923 115 1.1638 6436816
0.4722 0.3051 120 1.1622 6720832
0.3832 0.3178 125 1.1592 7010752
0.5639 0.3305 130 1.1548 7296936
0.4615 0.3432 135 1.1555 7569880
0.5294 0.3559 140 1.1487 7848792
0.4983 0.3686 145 1.1543 8130552
0.4877 0.3813 150 1.1442 8409680
0.419 0.3940 155 1.1497 8691616
0.5136 0.4067 160 1.1437 8974984
0.4672 0.4194 165 1.1442 9258208
0.4665 0.4322 170 1.1359 9538392
0.4105 0.4449 175 1.1412 9818464
0.5283 0.4576 180 1.1360 10102088
0.4097 0.4703 185 1.1388 10385664
0.4573 0.4830 190 1.1324 10667816
0.4047 0.4957 195 1.1343 10947272
0.4657 0.5084 200 1.1281 11227664
0.3811 0.5211 205 1.1295 11509152
0.43 0.5338 210 1.1294 11792720
0.4653 0.5466 215 1.1250 12068688
0.3614 0.5593 220 1.1273 12350648
0.4405 0.5720 225 1.1234 12628784
0.3511 0.5847 230 1.1251 12907416
0.4004 0.5974 235 1.1223 13192632
0.4819 0.6101 240 1.1201 13469328
0.4378 0.6228 245 1.1201 13748984
0.3615 0.6355 250 1.1166 14033560
0.3767 0.6482 255 1.1185 14315712
0.3775 0.6609 260 1.1169 14599040
0.4632 0.6737 265 1.1152 14883880
0.3246 0.6864 270 1.1148 15161064
0.3381 0.6991 275 1.1136 15435968
0.3762 0.7118 280 1.1167 15715000
0.3853 0.7245 285 1.1128 15992552
0.4548 0.7372 290 1.1124 16277624
0.3692 0.7499 295 1.1102 16554696
0.423 0.7626 300 1.1101 16842640
0.3635 0.7753 305 1.1124 17126528
0.3939 0.7881 310 1.1096 17402024
0.4323 0.8008 315 1.1092 17679664
0.3539 0.8135 320 1.1073 17959928
0.4876 0.8262 325 1.1077 18241888
0.3201 0.8389 330 1.1077 18521608
0.3806 0.8516 335 1.1060 18805032
0.3601 0.8643 340 1.1062 19089648
0.3919 0.8770 345 1.1049 19371096
0.3816 0.8897 350 1.1069 19650992
0.3584 0.9024 355 1.1051 19923856
0.3534 0.9152 360 1.1057 20198240
0.4761 0.9279 365 1.1049 20480400
0.3723 0.9406 370 1.1053 20761832
0.4056 0.9533 375 1.1036 21048104
0.3886 0.9660 380 1.1024 21323808
0.5005 0.9787 385 1.1028 21602888
0.3638 0.9914 390 1.1039 21887576

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
11
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd2

Base model

google/gemma-2-2b
Finetuned
(470)
this model