collapse_gemma-2-2b_hs2_replace_iter5_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.4466
  • Num Input Tokens Seen: 8280920

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.6329 0.0318 5 1.3100 261936
1.0812 0.0635 10 1.2393 527984
0.8509 0.0953 15 1.2939 798792
0.5856 0.1271 20 1.4435 1068224
0.3945 0.1589 25 1.5981 1333664
0.2591 0.1906 30 1.7370 1600920
0.2297 0.2224 35 1.9540 1862864
0.1491 0.2542 40 2.0318 2119104
0.0693 0.2859 45 2.2388 2377720
0.0509 0.3177 50 2.3196 2637816
0.0475 0.3495 55 2.3864 2900952
0.034 0.3813 60 2.4376 3166456
0.0324 0.4130 65 2.4449 3436144
0.034 0.4448 70 2.4523 3702280
0.0326 0.4766 75 2.4438 3966328
0.0336 0.5083 80 2.4354 4221440
0.0313 0.5401 85 2.4139 4486432
0.0283 0.5719 90 2.3846 4751320
0.0301 0.6037 95 2.3932 5019592
0.0284 0.6354 100 2.4044 5280712
0.0256 0.6672 105 2.4084 5539944
0.0329 0.6990 110 2.4300 5807632
0.0266 0.7307 115 2.4236 6068760
0.0267 0.7625 120 2.4100 6331712
0.0268 0.7943 125 2.4094 6593680
0.0272 0.8261 130 2.4229 6859744
0.0296 0.8578 135 2.4294 7118040
0.027 0.8896 140 2.4374 7383424
0.0264 0.9214 145 2.4434 7650680
0.0248 0.9531 150 2.4362 7915376
0.0264 0.9849 155 2.4400 8174680

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
8
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/collapse_gemma-2-2b_hs2_replace_iter5_sftsd1

Base model

google/gemma-2-2b
Finetuned
(470)
this model