jkazdan's picture
End of training
4520503 verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter10_sftsd1
    results: []

collapse_gemma-2-2b_hs2_replace_iter10_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.6696
  • Num Input Tokens Seen: 8069592

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.6377 0.0316 5 1.3108 263072
1.1505 0.0632 10 1.2446 517208
0.7821 0.0947 15 1.3348 769888
0.5 0.1263 20 1.5482 1037144
0.2735 0.1579 25 1.6972 1289920
0.1577 0.1895 30 1.8590 1549584
0.1591 0.2211 35 2.0535 1806400
0.0789 0.2527 40 2.2425 2074288
0.0452 0.2842 45 2.3649 2329816
0.0415 0.3158 50 2.4586 2578632
0.0324 0.3474 55 2.5176 2837904
0.0304 0.3790 60 2.5963 3082160
0.0255 0.4106 65 2.6502 3339600
0.0273 0.4422 70 2.6701 3591560
0.028 0.4737 75 2.6985 3840656
0.0256 0.5053 80 2.6940 4100552
0.027 0.5369 85 2.6789 4356792
0.0266 0.5685 90 2.6323 4606856
0.0276 0.6001 95 2.6233 4858464
0.0273 0.6317 100 2.6164 5115304
0.0256 0.6632 105 2.6240 5366024
0.0266 0.6948 110 2.6399 5619944
0.0271 0.7264 115 2.6708 5867752
0.0253 0.7580 120 2.6610 6119896
0.027 0.7896 125 2.6729 6375064
0.0228 0.8212 130 2.6813 6630984
0.0245 0.8527 135 2.6851 6879904
0.0229 0.8843 140 2.6942 7128024
0.0242 0.9159 145 2.6830 7385408
0.0255 0.9475 150 2.6686 7653440
0.0241 0.9791 155 2.6643 7918000

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1