jkazdan's picture
End of training
bfb3681 verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd1
    results: []

collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1082
  • Num Input Tokens Seen: 22210232

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.503 0.0126 5 1.3800 282920
1.455 0.0252 10 1.2923 559184
1.3333 0.0378 15 1.2129 846856
1.2486 0.0504 20 1.1638 1126408
1.1551 0.0631 25 1.1452 1406528
1.1054 0.0757 30 1.1245 1688296
1.0965 0.0883 35 1.1353 1968880
1.0551 0.1009 40 1.1321 2253600
1.0597 0.1135 45 1.1559 2533712
0.9056 0.1261 50 1.1557 2816168
0.8464 0.1387 55 1.1733 3098832
0.9006 0.1513 60 1.1706 3382160
0.9186 0.1640 65 1.1701 3666944
0.8413 0.1766 70 1.1751 3944648
0.7113 0.1892 75 1.1802 4223664
0.7537 0.2018 80 1.1851 4508224
0.6394 0.2144 85 1.1706 4784136
0.6311 0.2270 90 1.1754 5067048
0.6254 0.2396 95 1.1784 5349712
0.6607 0.2522 100 1.1751 5633272
0.5837 0.2649 105 1.1756 5912768
0.6424 0.2775 110 1.1776 6191704
0.6406 0.2901 115 1.1754 6470568
0.5878 0.3027 120 1.1710 6744504
0.5724 0.3153 125 1.1764 7024664
0.5836 0.3279 130 1.1698 7302984
0.446 0.3405 135 1.1691 7585104
0.5857 0.3531 140 1.1700 7862824
0.5039 0.3658 145 1.1668 8148912
0.5541 0.3784 150 1.1697 8433288
0.4768 0.3910 155 1.1661 8709864
0.5697 0.4036 160 1.1624 8988544
0.4883 0.4162 165 1.1638 9266360
0.4343 0.4288 170 1.1564 9543464
0.4952 0.4414 175 1.1573 9819888
0.4182 0.4540 180 1.1566 10103184
0.4055 0.4667 185 1.1518 10386496
0.4183 0.4793 190 1.1527 10666176
0.4075 0.4919 195 1.1490 10945288
0.5048 0.5045 200 1.1506 11223232
0.4409 0.5171 205 1.1465 11500056
0.4171 0.5297 210 1.1466 11780848
0.4131 0.5423 215 1.1399 12068144
0.4431 0.5549 220 1.1458 12350288
0.506 0.5676 225 1.1378 12628160
0.4679 0.5802 230 1.1369 12916360
0.3934 0.5928 235 1.1356 13195560
0.399 0.6054 240 1.1323 13478840
0.3821 0.6180 245 1.1334 13758120
0.4344 0.6306 250 1.1333 14040032
0.4234 0.6432 255 1.1304 14330400
0.3893 0.6558 260 1.1310 14609640
0.4944 0.6685 265 1.1288 14888960
0.3908 0.6811 270 1.1267 15176120
0.4795 0.6937 275 1.1300 15451048
0.3164 0.7063 280 1.1254 15731384
0.3661 0.7189 285 1.1277 16012616
0.4078 0.7315 290 1.1210 16294800
0.3492 0.7441 295 1.1256 16575776
0.3645 0.7567 300 1.1228 16854944
0.3274 0.7694 305 1.1202 17128336
0.4235 0.7820 310 1.1261 17405248
0.3793 0.7946 315 1.1186 17689720
0.3922 0.8072 320 1.1193 17960552
0.3589 0.8198 325 1.1177 18241224
0.3804 0.8324 330 1.1196 18526704
0.4036 0.8450 335 1.1169 18799280
0.4325 0.8576 340 1.1151 19085152
0.4554 0.8703 345 1.1187 19360616
0.4497 0.8829 350 1.1144 19636560
0.4199 0.8955 355 1.1148 19914344
0.4325 0.9081 360 1.1146 20197568
0.4471 0.9207 365 1.1124 20475496
0.3495 0.9333 370 1.1119 20753488
0.3166 0.9459 375 1.1116 21032504
0.4198 0.9585 380 1.1131 21311792
0.3419 0.9711 385 1.1107 21593296
0.3901 0.9838 390 1.1103 21874144
0.4237 0.9964 395 1.1078 22154792

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1