jkazdan's picture
End of training
32ef0e4 verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter5_sftsd2
    results: []

collapse_gemma-2-2b_hs2_replace_iter5_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.4530
  • Num Input Tokens Seen: 7866424

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.5841 0.0316 5 1.3064 246064
1.2139 0.0632 10 1.2259 490344
0.8575 0.0948 15 1.2862 743304
0.6223 0.1264 20 1.4173 991544
0.4134 0.1580 25 1.5797 1239400
0.2181 0.1896 30 1.7675 1492208
0.1602 0.2212 35 1.9274 1742536
0.1222 0.2528 40 1.9930 1993464
0.0567 0.2844 45 2.1636 2246064
0.0601 0.3160 50 2.2179 2495240
0.0426 0.3476 55 2.2534 2753624
0.0355 0.3791 60 2.3865 3000912
0.0353 0.4107 65 2.3864 3253912
0.029 0.4423 70 2.4098 3501280
0.028 0.4739 75 2.4119 3748336
0.0282 0.5055 80 2.4352 3992216
0.0297 0.5371 85 2.4314 4238048
0.0282 0.5687 90 2.4459 4485664
0.0294 0.6003 95 2.4529 4736648
0.0266 0.6319 100 2.4423 4994408
0.0264 0.6635 105 2.4515 5241848
0.0302 0.6951 110 2.4784 5488272
0.0283 0.7267 115 2.4612 5735720
0.0491 0.7583 120 2.4475 5982808
0.0284 0.7899 125 2.4495 6233656
0.0299 0.8215 130 2.4624 6483624
0.0279 0.8531 135 2.4608 6732040
0.0282 0.8847 140 2.4580 6974112
0.0258 0.9163 145 2.4557 7221264
0.0277 0.9479 150 2.4502 7469904
0.0273 0.9795 155 2.4400 7716824

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1