collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.1022
Num Input Tokens Seen: 22054048

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 2
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3956	0
1.628	0.0127	5	1.3800	282000
1.6129	0.0254	10	1.2915	565768
1.4755	0.0381	15	1.2119	845776
1.2663	0.0508	20	1.1654	1119976
1.2503	0.0636	25	1.1530	1405752
1.0375	0.0763	30	1.1358	1683544
0.9388	0.0890	35	1.1575	1962800
0.8887	0.1017	40	1.1613	2242448
0.9444	0.1144	45	1.1814	2530200
0.8274	0.1271	50	1.1685	2813744
0.7725	0.1398	55	1.1846	3088392
0.7435	0.1525	60	1.1750	3367968
0.8112	0.1652	65	1.1798	3653616
0.6116	0.1779	70	1.1803	3935936
0.6364	0.1907	75	1.1648	4215056
0.6888	0.2034	80	1.1682	4498800
0.6489	0.2161	85	1.1755	4777456
0.5009	0.2288	90	1.1711	5056576
0.6014	0.2415	95	1.1619	5333256
0.6265	0.2542	100	1.1702	5607960
0.4422	0.2669	105	1.1616	5888544
0.5504	0.2796	110	1.1721	6157688
0.5325	0.2923	115	1.1638	6436816
0.4722	0.3051	120	1.1622	6720832
0.3832	0.3178	125	1.1592	7010752
0.5639	0.3305	130	1.1548	7296936
0.4615	0.3432	135	1.1555	7569880
0.5294	0.3559	140	1.1487	7848792
0.4983	0.3686	145	1.1543	8130552
0.4877	0.3813	150	1.1442	8409680
0.419	0.3940	155	1.1497	8691616
0.5136	0.4067	160	1.1437	8974984
0.4672	0.4194	165	1.1442	9258208
0.4665	0.4322	170	1.1359	9538392
0.4105	0.4449	175	1.1412	9818464
0.5283	0.4576	180	1.1360	10102088
0.4097	0.4703	185	1.1388	10385664
0.4573	0.4830	190	1.1324	10667816
0.4047	0.4957	195	1.1343	10947272
0.4657	0.5084	200	1.1281	11227664
0.3811	0.5211	205	1.1295	11509152
0.43	0.5338	210	1.1294	11792720
0.4653	0.5466	215	1.1250	12068688
0.3614	0.5593	220	1.1273	12350648
0.4405	0.5720	225	1.1234	12628784
0.3511	0.5847	230	1.1251	12907416
0.4004	0.5974	235	1.1223	13192632
0.4819	0.6101	240	1.1201	13469328
0.4378	0.6228	245	1.1201	13748984
0.3615	0.6355	250	1.1166	14033560
0.3767	0.6482	255	1.1185	14315712
0.3775	0.6609	260	1.1169	14599040
0.4632	0.6737	265	1.1152	14883880
0.3246	0.6864	270	1.1148	15161064
0.3381	0.6991	275	1.1136	15435968
0.3762	0.7118	280	1.1167	15715000
0.3853	0.7245	285	1.1128	15992552
0.4548	0.7372	290	1.1124	16277624
0.3692	0.7499	295	1.1102	16554696
0.423	0.7626	300	1.1101	16842640
0.3635	0.7753	305	1.1124	17126528
0.3939	0.7881	310	1.1096	17402024
0.4323	0.8008	315	1.1092	17679664
0.3539	0.8135	320	1.1073	17959928
0.4876	0.8262	325	1.1077	18241888
0.3201	0.8389	330	1.1077	18521608
0.3806	0.8516	335	1.1060	18805032
0.3601	0.8643	340	1.1062	19089648
0.3919	0.8770	345	1.1049	19371096
0.3816	0.8897	350	1.1069	19650992
0.3584	0.9024	355	1.1051	19923856
0.3534	0.9152	360	1.1057	20198240
0.4761	0.9279	365	1.1049	20480400
0.3723	0.9406	370	1.1053	20761832
0.4056	0.9533	375	1.1036	21048104
0.3886	0.9660	380	1.1024	21323808
0.5005	0.9787	385	1.1028	21602888
0.3638	0.9914	390	1.1039	21887576

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

jkazdan
/

collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd2

collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for jkazdan/collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd2

Evaluation results