jkazdan's picture
End of training
35b3ebc verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter9_sftsd1
    results: []

collapse_gemma-2-2b_hs2_replace_iter9_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.6634
  • Num Input Tokens Seen: 8155608

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.5468 0.0315 5 1.3110 253928
1.182 0.0630 10 1.2480 514960
0.8084 0.0945 15 1.3189 773760
0.6416 0.1259 20 1.4974 1041648
0.3966 0.1574 25 1.6165 1307976
0.2039 0.1889 30 1.8225 1565976
0.1576 0.2204 35 1.9499 1822872
0.0829 0.2519 40 2.1969 2080200
0.0476 0.2834 45 2.3565 2335552
0.0338 0.3148 50 2.4119 2590880
0.0303 0.3463 55 2.5071 2851232
0.0381 0.3778 60 2.5463 3110576
0.0307 0.4093 65 2.5668 3369800
0.0279 0.4408 70 2.5711 3630600
0.0262 0.4723 75 2.6104 3884416
0.0284 0.5037 80 2.6201 4140232
0.0265 0.5352 85 2.6255 4390344
0.0265 0.5667 90 2.6473 4646944
0.0288 0.5982 95 2.6452 4907960
0.0242 0.6297 100 2.6281 5157432
0.0235 0.6612 105 2.6248 5417680
0.0256 0.6926 110 2.6399 5680504
0.0224 0.7241 115 2.6534 5934288
0.0246 0.7556 120 2.6607 6188664
0.0313 0.7871 125 2.6628 6444560
0.0252 0.8186 130 2.6540 6702464
0.0258 0.8501 135 2.6528 6962424
0.0276 0.8815 140 2.6468 7217352
0.0245 0.9130 145 2.6580 7472288
0.025 0.9445 150 2.6685 7739408
0.0285 0.9760 155 2.6733 8001312

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1