RylanSchaeffer's picture
End of training
c818976 verified
metadata
license: gemma
base_model: google/gemma-2-27b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-27b_hs2_accumulate_iter2_sftsd0
    results: []

collapse_gemma-2-27b_hs2_accumulate_iter2_sftsd0

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9209
  • Num Input Tokens Seen: 9209636

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.1282 0
1.7181 0.0278 5 1.0178 258940
1.5727 0.0556 10 0.9732 523840
1.4846 0.0834 15 0.9611 775600
1.5742 0.1113 20 0.9573 1031780
1.5061 0.1391 25 0.9571 1291404
1.2746 0.1669 30 0.9544 1551680
1.2702 0.1947 35 0.9557 1808156
1.329 0.2225 40 0.9525 2060568
1.1092 0.2503 45 0.9495 2319496
0.9658 0.2782 50 0.9482 2567632
1.0994 0.3060 55 0.9444 2831744
1.0686 0.3338 60 0.9435 3087788
1.115 0.3616 65 0.9405 3340312
1.0044 0.3894 70 0.9375 3602000
1.1384 0.4172 75 0.9357 3868648
1.0943 0.4451 80 0.9361 4121888
1.0129 0.4729 85 0.9323 4375104
0.9281 0.5007 90 0.9314 4629144
0.9001 0.5285 95 0.9316 4881800
1.0471 0.5563 100 0.9303 5142288
1.0141 0.5841 105 0.9302 5398480
1.0427 0.6120 110 0.9280 5651544
0.9628 0.6398 115 0.9274 5904284
0.8986 0.6676 120 0.9257 6160992
0.9081 0.6954 125 0.9279 6427076
0.957 0.7232 130 0.9241 6686176
0.9556 0.7510 135 0.9246 6942364
0.9609 0.7789 140 0.9244 7193836
0.9889 0.8067 145 0.9228 7452352
0.9009 0.8345 150 0.9231 7708728
0.8942 0.8623 155 0.9217 7969644
0.9304 0.8901 160 0.9216 8223032
0.9462 0.9179 165 0.9212 8481188
0.9904 0.9458 170 0.9204 8743924
0.9147 0.9736 175 0.9204 8999112

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1