llama3.1-8B-eeszt-structured

This model is a fine-tuned version of meta-llama/Llama-3.1-8B on the None dataset. It achieves the following results on the evaluation set:

Loss: 1.3304

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 4
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Use paged_adamw_32bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
training_steps: 500
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
No log	0.8889	4	1.6790
2.1272	1.8333	8	1.5754
1.8869	2.7778	12	1.4449
1.8458	3.9444	17	1.3113
1.5497	4.8889	21	1.2161
1.4996	5.8333	25	1.1479
1.4996	6.7778	29	1.0829
1.5	7.9444	34	1.0096
1.1576	8.8889	38	0.9470
1.1188	9.8333	42	0.9070
0.881	10.7778	46	0.8688
0.9199	11.9444	51	0.8224
0.7161	12.8889	55	0.7994
0.7161	13.8333	59	0.7957
0.7983	14.7778	63	0.7891
0.5833	15.9444	68	0.7692
0.5577	16.8889	72	0.7593
0.4911	17.8333	76	0.7867
0.4478	18.7778	80	0.8088
0.5181	19.9444	85	0.8089
0.5181	20.8889	89	0.7761
0.3977	21.8333	93	0.7940
0.3655	22.7778	97	0.8387
0.293	23.9444	102	0.8603
0.2978	24.8889	106	0.8603
0.2573	25.8333	110	0.8431
0.2573	26.7778	114	0.9431
0.2802	27.9444	119	0.9213
0.2116	28.8889	123	0.9327
0.208	29.8333	127	0.9562
0.2012	30.7778	131	0.9036
0.1807	31.9444	136	0.9352
0.1885	32.8889	140	1.0403
0.1885	33.8333	144	0.9444
0.1898	34.7778	148	0.9924
0.1504	35.9444	153	1.0616
0.14	36.8889	157	0.9799
0.1428	37.8333	161	1.0503
0.1174	38.7778	165	1.0565
0.1513	39.9444	170	1.0090
0.1513	40.8889	174	1.0892
0.1053	41.8333	178	1.0162
0.1056	42.7778	182	1.1173
0.1127	43.9444	187	1.0811
0.0927	44.8889	191	1.0970
0.0963	45.8333	195	1.0959
0.0963	46.7778	199	1.0603
0.1043	47.9444	204	1.1082
0.0845	48.8889	208	1.0794
0.0728	49.8333	212	1.1056
0.0779	50.7778	216	1.1265
0.0706	51.9444	221	1.1261
0.06	52.8889	225	1.1191
0.06	53.8333	229	1.1820
0.0692	54.7778	233	1.1651
0.0558	55.9444	238	1.1954
0.0529	56.8889	242	1.1271
0.054	57.8333	246	1.0981
0.0491	58.7778	250	1.1937
0.0588	59.9444	255	1.1734
0.0588	60.8889	259	1.2405
0.0435	61.8333	263	1.1687
0.0394	62.7778	267	1.1928
0.0446	63.9444	272	1.2214
0.0414	64.8889	276	1.2216
0.0378	65.8333	280	1.2238
0.0378	66.7778	284	1.2372
0.0455	67.9444	289	1.2214
0.0377	68.8889	293	1.2555
0.0327	69.8333	297	1.2370
0.033	70.7778	301	1.2383
0.0342	71.9444	306	1.2499
0.032	72.8889	310	1.2769
0.032	73.8333	314	1.2521
0.0389	74.7778	318	1.2544
0.0312	75.9444	323	1.2710
0.0294	76.8889	327	1.2853
0.0269	77.8333	331	1.2947
0.028	78.7778	335	1.3076
0.0334	79.9444	340	1.3095
0.0334	80.8889	344	1.2938
0.0257	81.8333	348	1.2813
0.0265	82.7778	352	1.2840
0.0262	83.9444	357	1.2902
0.0243	84.8889	361	1.3001
0.0232	85.8333	365	1.3042
0.0232	86.7778	369	1.3044
0.027	87.9444	374	1.2909
0.0224	88.8889	378	1.2925
0.0239	89.8333	382	1.2949
0.0221	90.7778	386	1.3046
0.0244	91.9444	391	1.3120
0.0256	92.8889	395	1.3179
0.0256	93.8333	399	1.3150
0.0276	94.7778	403	1.3069
0.0226	95.9444	408	1.2978
0.0279	96.8889	412	1.2995
0.0218	97.8333	416	1.3054
0.0224	98.7778	420	1.3163
0.0236	99.9444	425	1.3296
0.0236	100.8889	429	1.3317
0.021	101.8333	433	1.3305
0.0208	102.7778	437	1.3273
0.0205	103.9444	442	1.3253
0.0213	104.8889	446	1.3249
0.0208	105.8333	450	1.3257
0.0208	106.7778	454	1.3263
0.0221	107.9444	459	1.3271
0.0223	108.8889	463	1.3279
0.0194	109.8333	467	1.3291
0.0207	110.7778	471	1.3293
0.0211	111.9444	476	1.3296
0.0193	112.8889	480	1.3302
0.0193	113.8333	484	1.3301
0.0217	114.7778	488	1.3295
0.0201	115.9444	493	1.3301
0.0201	116.8889	497	1.3305
0.0201	117.6111	500	1.3304

Framework versions

PEFT 0.13.2
Transformers 4.46.1
Pytorch 2.5.1+cu124
Datasets 3.0.2
Tokenizers 0.20.1

aborcs
/

llama3.1-8B-eeszt-structured

llama3.1-8B-eeszt-structured

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for aborcs/llama3.1-8B-eeszt-structured

Evaluation results