collapse_gemma-2-2b_hs2_replace_iter4_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.2615
  • Num Input Tokens Seen: 8543104

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.5484 0.0318 5 1.3101 278776
1.085 0.0636 10 1.2389 553696
0.9004 0.0954 15 1.2791 827056
0.6616 0.1272 20 1.4012 1104568
0.4986 0.1590 25 1.5346 1374176
0.3579 0.1908 30 1.6346 1652112
0.174 0.2226 35 1.8294 1920080
0.1501 0.2544 40 1.9929 2190160
0.0726 0.2862 45 2.1418 2461008
0.0577 0.3180 50 2.2236 2733848
0.0399 0.3498 55 2.2644 3013016
0.04 0.3816 60 2.2704 3278704
0.0398 0.4134 65 2.2606 3562656
0.0475 0.4452 70 2.2814 3834984
0.0332 0.4769 75 2.2999 4109144
0.0352 0.5087 80 2.2843 4381312
0.0369 0.5405 85 2.2426 4652664
0.0333 0.5723 90 2.2229 4923152
0.029 0.6041 95 2.2462 5195000
0.0306 0.6359 100 2.2501 5458808
0.0307 0.6677 105 2.2394 5732184
0.0367 0.6995 110 2.2141 6004504
0.03 0.7313 115 2.2083 6272888
0.0282 0.7631 120 2.2138 6546192
0.0288 0.7949 125 2.2392 6808560
0.0292 0.8267 130 2.2351 7076096
0.028 0.8585 135 2.2202 7349312
0.0303 0.8903 140 2.2292 7627368
0.0285 0.9221 145 2.2498 7893744
0.0261 0.9539 150 2.2720 8162232
0.0317 0.9857 155 2.2699 8431896

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
9
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/collapse_gemma-2-2b_hs2_replace_iter4_sftsd1

Base model

google/gemma-2-2b
Finetuned
(470)
this model