RylanSchaeffer's picture
End of training
9805308 verified
metadata
license: gemma
base_model: google/gemma-2-27b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-27b_hs2_replace_iter2_sftsd0
    results: []

collapse_gemma-2-27b_hs2_replace_iter2_sftsd0

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1980
  • Num Input Tokens Seen: 3884172

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.1282 0
2.5619 0.0609 5 1.0479 234712
2.306 0.1217 10 1.0663 474768
1.7728 0.1826 15 1.0981 710708
1.6061 0.2434 20 1.1511 948504
1.2679 0.3043 25 1.1725 1186572
1.1194 0.3652 30 1.1757 1416168
0.9711 0.4260 35 1.1728 1646396
1.0101 0.4869 40 1.1625 1890108
0.9108 0.5477 45 1.1646 2129884
0.8715 0.6086 50 1.1616 2364656
0.9123 0.6695 55 1.1721 2601440
0.8214 0.7303 60 1.1805 2841352
1.0584 0.7912 65 1.1801 3084488
0.7651 0.8520 70 1.1723 3317380
0.7307 0.9129 75 1.1715 3557164
0.8541 0.9738 80 1.1968 3787968

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1