metadata

license: gemma
base_model: google/gemma-2-27b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-27b_hs2_replace_iter1_sftsd1
    results: []

collapse_gemma-2-27b_hs2_replace_iter1_sftsd1

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9050
Num Input Tokens Seen: 5254884

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 4
eval_batch_size: 16
seed: 1
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.1282	0
0.9865	0.0511	5	0.9815	260128
0.9827	0.1021	10	0.9503	527396
0.9415	0.1532	15	0.9387	803280
0.9777	0.2043	20	0.9341	1074404
0.896	0.2553	25	0.9291	1348060
0.9836	0.3064	30	0.9259	1614960
0.8868	0.3575	35	0.9217	1884844
0.9037	0.4086	40	0.9192	2154208
0.9543	0.4596	45	0.9170	2424544
0.8617	0.5107	50	0.9155	2690292
0.9376	0.5618	55	0.9136	2962944
0.9256	0.6128	60	0.9114	3234692
0.8981	0.6639	65	0.9102	3510980
0.904	0.7150	70	0.9086	3790388
0.8904	0.7660	75	0.9081	4069200
0.9635	0.8171	80	0.9078	4338748
0.9016	0.8682	85	0.9061	4606552
0.8514	0.9192	90	0.9062	4877900
0.8992	0.9703	95	0.9058	5147172

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1