collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.1082
Num Input Tokens Seen: 22210232

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 1
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3956	0
1.503	0.0126	5	1.3800	282920
1.455	0.0252	10	1.2923	559184
1.3333	0.0378	15	1.2129	846856
1.2486	0.0504	20	1.1638	1126408
1.1551	0.0631	25	1.1452	1406528
1.1054	0.0757	30	1.1245	1688296
1.0965	0.0883	35	1.1353	1968880
1.0551	0.1009	40	1.1321	2253600
1.0597	0.1135	45	1.1559	2533712
0.9056	0.1261	50	1.1557	2816168
0.8464	0.1387	55	1.1733	3098832
0.9006	0.1513	60	1.1706	3382160
0.9186	0.1640	65	1.1701	3666944
0.8413	0.1766	70	1.1751	3944648
0.7113	0.1892	75	1.1802	4223664
0.7537	0.2018	80	1.1851	4508224
0.6394	0.2144	85	1.1706	4784136
0.6311	0.2270	90	1.1754	5067048
0.6254	0.2396	95	1.1784	5349712
0.6607	0.2522	100	1.1751	5633272
0.5837	0.2649	105	1.1756	5912768
0.6424	0.2775	110	1.1776	6191704
0.6406	0.2901	115	1.1754	6470568
0.5878	0.3027	120	1.1710	6744504
0.5724	0.3153	125	1.1764	7024664
0.5836	0.3279	130	1.1698	7302984
0.446	0.3405	135	1.1691	7585104
0.5857	0.3531	140	1.1700	7862824
0.5039	0.3658	145	1.1668	8148912
0.5541	0.3784	150	1.1697	8433288
0.4768	0.3910	155	1.1661	8709864
0.5697	0.4036	160	1.1624	8988544
0.4883	0.4162	165	1.1638	9266360
0.4343	0.4288	170	1.1564	9543464
0.4952	0.4414	175	1.1573	9819888
0.4182	0.4540	180	1.1566	10103184
0.4055	0.4667	185	1.1518	10386496
0.4183	0.4793	190	1.1527	10666176
0.4075	0.4919	195	1.1490	10945288
0.5048	0.5045	200	1.1506	11223232
0.4409	0.5171	205	1.1465	11500056
0.4171	0.5297	210	1.1466	11780848
0.4131	0.5423	215	1.1399	12068144
0.4431	0.5549	220	1.1458	12350288
0.506	0.5676	225	1.1378	12628160
0.4679	0.5802	230	1.1369	12916360
0.3934	0.5928	235	1.1356	13195560
0.399	0.6054	240	1.1323	13478840
0.3821	0.6180	245	1.1334	13758120
0.4344	0.6306	250	1.1333	14040032
0.4234	0.6432	255	1.1304	14330400
0.3893	0.6558	260	1.1310	14609640
0.4944	0.6685	265	1.1288	14888960
0.3908	0.6811	270	1.1267	15176120
0.4795	0.6937	275	1.1300	15451048
0.3164	0.7063	280	1.1254	15731384
0.3661	0.7189	285	1.1277	16012616
0.4078	0.7315	290	1.1210	16294800
0.3492	0.7441	295	1.1256	16575776
0.3645	0.7567	300	1.1228	16854944
0.3274	0.7694	305	1.1202	17128336
0.4235	0.7820	310	1.1261	17405248
0.3793	0.7946	315	1.1186	17689720
0.3922	0.8072	320	1.1193	17960552
0.3589	0.8198	325	1.1177	18241224
0.3804	0.8324	330	1.1196	18526704
0.4036	0.8450	335	1.1169	18799280
0.4325	0.8576	340	1.1151	19085152
0.4554	0.8703	345	1.1187	19360616
0.4497	0.8829	350	1.1144	19636560
0.4199	0.8955	355	1.1148	19914344
0.4325	0.9081	360	1.1146	20197568
0.4471	0.9207	365	1.1124	20475496
0.3495	0.9333	370	1.1119	20753488
0.3166	0.9459	375	1.1116	21032504
0.4198	0.9585	380	1.1131	21311792
0.3419	0.9711	385	1.1107	21593296
0.3901	0.9838	390	1.1103	21874144
0.4237	0.9964	395	1.1078	22154792

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

jkazdan
/

collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd1

collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for jkazdan/collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd1

Evaluation results