metadata

license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter6_sftsd2
    results: []

collapse_gemma-2-2b_hs2_replace_iter6_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 2.5403
Num Input Tokens Seen: 8006024

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 2
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3956	0
1.6849	0.0315	5	1.3074	251000
1.118	0.0630	10	1.2395	501688
0.7452	0.0945	15	1.3250	754952
0.6083	0.1259	20	1.4758	1008248
0.3921	0.1574	25	1.6064	1265824
0.259	0.1889	30	1.8043	1526472
0.1544	0.2204	35	1.9984	1787752
0.0991	0.2519	40	2.1525	2040120
0.0448	0.2834	45	2.2692	2285400
0.0467	0.3148	50	2.3345	2536824
0.0435	0.3463	55	2.4326	2790040
0.033	0.3778	60	2.5077	3046656
0.031	0.4093	65	2.5876	3295608
0.0311	0.4408	70	2.5704	3545992
0.0287	0.4723	75	2.5464	3802920
0.0257	0.5037	80	2.5635	4056400
0.0303	0.5352	85	2.5473	4310104
0.0252	0.5667	90	2.5338	4566456
0.0271	0.5982	95	2.5463	4822016
0.0269	0.6297	100	2.5515	5074048
0.0264	0.6612	105	2.5565	5332864
0.0272	0.6926	110	2.5661	5586528
0.025	0.7241	115	2.5334	5839624
0.0264	0.7556	120	2.5193	6095336
0.0252	0.7871	125	2.5051	6352376
0.0243	0.8186	130	2.5119	6603584
0.0281	0.8501	135	2.5157	6852952
0.0261	0.8815	140	2.5087	7101856
0.0255	0.9130	145	2.5109	7353304
0.0253	0.9445	150	2.5341	7598048
0.0262	0.9760	155	2.5437	7851000

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1