RylanSchaeffer's picture
End of training
f6e8ee4 verified
metadata
license: gemma
base_model: google/gemma-2-27b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-27b_hs2_accumulate_iter2_sftsd1
    results: []

collapse_gemma-2-27b_hs2_accumulate_iter2_sftsd1

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9205
  • Num Input Tokens Seen: 9260048

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.1282 0
1.6371 0.0278 5 1.0191 267540
1.6662 0.0555 10 0.9724 529072
1.3682 0.0833 15 0.9593 785900
1.465 0.1111 20 0.9571 1044336
1.4939 0.1388 25 0.9602 1302088
1.2547 0.1666 30 0.9604 1553344
1.3517 0.1944 35 0.9564 1808072
1.2594 0.2221 40 0.9531 2069656
1.122 0.2499 45 0.9501 2318876
1.1374 0.2777 50 0.9461 2574304
1.0402 0.3054 55 0.9441 2835460
0.9125 0.3332 60 0.9417 3100792
0.9725 0.3610 65 0.9371 3359628
1.0081 0.3888 70 0.9372 3624016
0.9675 0.4165 75 0.9346 3880016
1.0841 0.4443 80 0.9351 4126948
1.0015 0.4721 85 0.9313 4380144
1.0436 0.4998 90 0.9319 4645252
1.0193 0.5276 95 0.9298 4896128
1.0469 0.5554 100 0.9291 5148796
0.8706 0.5831 105 0.9269 5411164
0.8656 0.6109 110 0.9262 5663420
1.0066 0.6387 115 0.9243 5931756
0.8539 0.6664 120 0.9247 6192724
0.9333 0.6942 125 0.9233 6447744
0.8919 0.7220 130 0.9224 6704364
0.8694 0.7497 135 0.9221 6955692
0.916 0.7775 140 0.9211 7212120
0.9457 0.8053 145 0.9215 7469356
0.8997 0.8330 150 0.9199 7730508
0.8992 0.8608 155 0.9206 7979484
0.9604 0.8886 160 0.9191 8234480
0.9 0.9163 165 0.9186 8489188
0.9385 0.9441 170 0.9202 8745136
0.964 0.9719 175 0.9175 9000760
0.9423 0.9997 180 0.9205 9260048

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1