jkazdan's picture
End of training
df77b29 verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter10_sftsd0
    results: []

collapse_gemma-2-2b_hs2_replace_iter10_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.6489
  • Num Input Tokens Seen: 7847456

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.6165 0.0315 5 1.3101 243056
1.172 0.0630 10 1.2582 490584
0.7463 0.0945 15 1.3463 739912
0.464 0.1259 20 1.5692 990256
0.3508 0.1574 25 1.6804 1240472
0.1841 0.1889 30 1.8268 1486504
0.097 0.2204 35 2.0664 1738888
0.0635 0.2519 40 2.2289 1980448
0.0452 0.2834 45 2.3601 2228472
0.0422 0.3148 50 2.4612 2476704
0.0364 0.3463 55 2.5085 2731320
0.0341 0.3778 60 2.5222 2979128
0.0304 0.4093 65 2.5257 3220656
0.0281 0.4408 70 2.5652 3471576
0.0247 0.4723 75 2.5819 3708808
0.025 0.5037 80 2.5916 3965208
0.0244 0.5352 85 2.6135 4212016
0.026 0.5667 90 2.6298 4456472
0.0236 0.5982 95 2.6157 4708392
0.0232 0.6297 100 2.6080 4960984
0.0234 0.6612 105 2.6128 5209920
0.0294 0.6926 110 2.6223 5450632
0.0257 0.7241 115 2.6158 5696272
0.0264 0.7556 120 2.6134 5945648
0.0245 0.7871 125 2.6294 6201960
0.0235 0.8186 130 2.6448 6449680
0.0225 0.8501 135 2.6495 6695680
0.0251 0.8815 140 2.6462 6946656
0.0252 0.9130 145 2.6537 7189752
0.0223 0.9445 150 2.6462 7443512
0.0217 0.9760 155 2.6488 7693456

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1