collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.1141
Num Input Tokens Seen: 38535136

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 0
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3956	0
1.753	0.0072	5	1.3915	274800
1.645	0.0143	10	1.3471	553336
1.5097	0.0215	15	1.2784	825472
1.4043	0.0286	20	1.2252	1094400
1.2715	0.0358	25	1.1789	1368560
1.2172	0.0430	30	1.1681	1643752
1.1701	0.0501	35	1.1474	1919896
1.012	0.0573	40	1.1634	2202664
0.977	0.0645	45	1.1833	2485728
0.9504	0.0716	50	1.1892	2760656
0.8804	0.0788	55	1.2024	3030296
0.7706	0.0859	60	1.2172	3302344
0.7382	0.0931	65	1.2354	3580728
0.5907	0.1003	70	1.2216	3858040
0.5639	0.1074	75	1.2151	4131752
0.5866	0.1146	80	1.2167	4408168
0.6131	0.1217	85	1.2232	4684048
0.5387	0.1289	90	1.2203	4956816
0.588	0.1361	95	1.2124	5236664
0.5076	0.1432	100	1.2125	5512104
0.4164	0.1504	105	1.2181	5787680
0.4371	0.1576	110	1.2111	6061640
0.4415	0.1647	115	1.2035	6339744
0.4482	0.1719	120	1.2025	6616088
0.4337	0.1790	125	1.2025	6890352
0.4609	0.1862	130	1.1980	7161728
0.3955	0.1934	135	1.2066	7437056
0.4134	0.2005	140	1.1994	7714640
0.2926	0.2077	145	1.2044	7990680
0.5047	0.2148	150	1.1958	8272024
0.3491	0.2220	155	1.2003	8543152
0.3948	0.2292	160	1.1946	8817304
0.4029	0.2363	165	1.2019	9095752
0.2683	0.2435	170	1.1840	9367952
0.3407	0.2506	175	1.1988	9649744
0.3316	0.2578	180	1.1874	9915512
0.4204	0.2650	185	1.1885	10190280
0.2743	0.2721	190	1.1846	10465416
0.2852	0.2793	195	1.1833	10743016
0.3708	0.2865	200	1.1827	11018864
0.2405	0.2936	205	1.1810	11294712
0.3435	0.3008	210	1.1847	11566136
0.277	0.3079	215	1.1775	11839000
0.31	0.3151	220	1.1869	12110104
0.3004	0.3223	225	1.1719	12387072
0.2593	0.3294	230	1.1799	12659864
0.3017	0.3366	235	1.1710	12928592
0.3225	0.3437	240	1.1738	13203112
0.2976	0.3509	245	1.1753	13475880
0.2385	0.3581	250	1.1657	13751768
0.3222	0.3652	255	1.1733	14032088
0.2892	0.3724	260	1.1660	14306696
0.5871	0.3796	265	1.1624	14590560
0.3256	0.3867	270	1.1665	14862432
0.312	0.3939	275	1.1600	15143808
0.317	0.4010	280	1.1618	15415480
0.2964	0.4082	285	1.1640	15694936
0.3226	0.4154	290	1.1586	15974968
0.2756	0.4225	295	1.1595	16255032
0.2167	0.4297	300	1.1596	16539088
0.3576	0.4368	305	1.1566	16819088
0.2757	0.4440	310	1.1541	17100912
0.2413	0.4512	315	1.1550	17373744
0.3459	0.4583	320	1.1483	17647448
0.2882	0.4655	325	1.1493	17922920
0.2383	0.4727	330	1.1471	18194680
0.2872	0.4798	335	1.1510	18471192
0.2302	0.4870	340	1.1474	18747848
0.285	0.4941	345	1.1484	19026688
0.2765	0.5013	350	1.1456	19293616
0.1756	0.5085	355	1.1435	19570744
0.303	0.5156	360	1.1457	19845048
0.2726	0.5228	365	1.1422	20115096
0.2625	0.5299	370	1.1423	20395336
0.2419	0.5371	375	1.1430	20667208
0.1856	0.5443	380	1.1388	20948560
0.3427	0.5514	385	1.1400	21218968
0.2147	0.5586	390	1.1354	21489088
0.2514	0.5658	395	1.1387	21764248
0.293	0.5729	400	1.1345	22038944
0.2699	0.5801	405	1.1349	22312360
0.2219	0.5872	410	1.1353	22589016
0.3573	0.5944	415	1.1305	22864576
0.343	0.6016	420	1.1355	23144760
0.2924	0.6087	425	1.1347	23421952
0.2846	0.6159	430	1.1293	23700352
0.2971	0.6230	435	1.1328	23983624
0.2037	0.6302	440	1.1312	24263512
0.29	0.6374	445	1.1309	24530624
0.2089	0.6445	450	1.1317	24800848
0.2477	0.6517	455	1.1318	25080464
0.2275	0.6588	460	1.1265	25356832
0.2335	0.6660	465	1.1285	25638344
0.1839	0.6732	470	1.1326	25912488
0.2514	0.6803	475	1.1276	26189888
0.3751	0.6875	480	1.1271	26472040
0.2701	0.6947	485	1.1260	26753624
0.2235	0.7018	490	1.1254	27029592
0.244	0.7090	495	1.1246	27311520
0.2294	0.7161	500	1.1231	27586432
0.2949	0.7233	505	1.1247	27860176
0.1593	0.7305	510	1.1254	28137160
0.2553	0.7376	515	1.1257	28418864
0.1885	0.7448	520	1.1249	28696856
0.2695	0.7519	525	1.1251	28975192
0.2545	0.7591	530	1.1214	29251760
0.2446	0.7663	535	1.1211	29528808
0.3202	0.7734	540	1.1233	29803128
0.2623	0.7806	545	1.1200	30079416
0.2142	0.7878	550	1.1205	30352064
0.2502	0.7949	555	1.1210	30629824
0.3042	0.8021	560	1.1180	30904272
0.197	0.8092	565	1.1196	31174976
0.2593	0.8164	570	1.1191	31446624
0.3324	0.8236	575	1.1183	31729592
0.2113	0.8307	580	1.1203	32004000
0.2764	0.8379	585	1.1196	32277080
0.2863	0.8450	590	1.1166	32551352
0.1917	0.8522	595	1.1213	32831496
0.1784	0.8594	600	1.1194	33113448
0.2198	0.8665	605	1.1173	33387680
0.3067	0.8737	610	1.1185	33664656
0.2372	0.8809	615	1.1154	33938472
0.2207	0.8880	620	1.1172	34216144
0.2026	0.8952	625	1.1177	34487704
0.2003	0.9023	630	1.1144	34767944
0.2438	0.9095	635	1.1178	35042160
0.3055	0.9167	640	1.1154	35322704
0.2598	0.9238	645	1.1137	35599184
0.2283	0.9310	650	1.1163	35874368
0.2463	0.9381	655	1.1152	36142120
0.2388	0.9453	660	1.1133	36411336
0.2284	0.9525	665	1.1161	36697696
0.2146	0.9596	670	1.1133	36973112
0.2494	0.9668	675	1.1151	37252568
0.2118	0.9740	680	1.1151	37528656
0.2539	0.9811	685	1.1131	37804520
0.2345	0.9883	690	1.1137	38078360
0.2216	0.9954	695	1.1142	38361640

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

jkazdan
/

collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd0

collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd0

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for jkazdan/collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd0

Evaluation results