metadata

license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter10_sftsd1
    results: []

collapse_gemma-2-2b_hs2_replace_iter10_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 2.6696
Num Input Tokens Seen: 8069592

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 1
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3956	0
1.6377	0.0316	5	1.3108	263072
1.1505	0.0632	10	1.2446	517208
0.7821	0.0947	15	1.3348	769888
0.5	0.1263	20	1.5482	1037144
0.2735	0.1579	25	1.6972	1289920
0.1577	0.1895	30	1.8590	1549584
0.1591	0.2211	35	2.0535	1806400
0.0789	0.2527	40	2.2425	2074288
0.0452	0.2842	45	2.3649	2329816
0.0415	0.3158	50	2.4586	2578632
0.0324	0.3474	55	2.5176	2837904
0.0304	0.3790	60	2.5963	3082160
0.0255	0.4106	65	2.6502	3339600
0.0273	0.4422	70	2.6701	3591560
0.028	0.4737	75	2.6985	3840656
0.0256	0.5053	80	2.6940	4100552
0.027	0.5369	85	2.6789	4356792
0.0266	0.5685	90	2.6323	4606856
0.0276	0.6001	95	2.6233	4858464
0.0273	0.6317	100	2.6164	5115304
0.0256	0.6632	105	2.6240	5366024
0.0266	0.6948	110	2.6399	5619944
0.0271	0.7264	115	2.6708	5867752
0.0253	0.7580	120	2.6610	6119896
0.027	0.7896	125	2.6729	6375064
0.0228	0.8212	130	2.6813	6630984
0.0245	0.8527	135	2.6851	6879904
0.0229	0.8843	140	2.6942	7128024
0.0242	0.9159	145	2.6830	7385408
0.0255	0.9475	150	2.6686	7653440
0.0241	0.9791	155	2.6643	7918000

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1