tinyllama-icd_qa_5q_all_or_nothing

This model is a fine-tuned version of TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T on the None dataset. It achieves the following results on the evaluation set:

Loss: 1.0245

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 2
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 5
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
1.897	0.0421	1000	2.0673
1.8168	0.0842	2000	1.8269
1.8585	0.1264	3000	1.7420
1.4869	0.1685	4000	1.6427
1.4204	0.2106	5000	1.5763
1.2253	0.2527	6000	1.5298
1.326	0.2948	7000	1.4959
1.5507	0.3370	8000	1.4615
1.2435	0.3791	9000	1.4142
1.5801	0.4212	10000	1.3806
1.1879	0.4633	11000	1.3511
1.2121	0.5054	12000	1.3304
1.0262	0.5476	13000	1.3335
1.0754	0.5897	14000	1.2921
1.0264	0.6318	15000	1.2785
1.0067	0.6739	16000	1.2544
1.1532	0.7160	17000	1.2323
1.1084	0.7582	18000	1.2354
1.1094	0.8003	19000	1.2106
0.9589	0.8424	20000	1.1987
0.9922	0.8845	21000	1.1946
1.0851	0.9266	22000	1.1723
1.0407	0.9688	23000	1.1528
0.7835	1.0109	24000	1.1521
0.9736	1.0530	25000	1.1451
0.9463	1.0951	26000	1.1424
0.9879	1.1372	27000	1.1394
0.8803	1.1794	28000	1.1301
0.9309	1.2215	29000	1.1380
0.9263	1.2636	30000	1.1157
0.975	1.3057	31000	1.1028
0.7661	1.3479	32000	1.0962
0.8526	1.3900	33000	1.0979
0.9277	1.4321	34000	1.0948
0.8177	1.4742	35000	1.0887
0.8935	1.5163	36000	1.0810
0.8534	1.5585	37000	1.0868
0.8604	1.6006	38000	1.0673
0.8426	1.6427	39000	1.0699
0.9027	1.6848	40000	1.0588
0.8062	1.7269	41000	1.0487
1.0168	1.7691	42000	1.0508
0.8437	1.8112	43000	1.0416
0.9178	1.8533	44000	1.0256
0.9543	1.8954	45000	1.0266
0.787	1.9375	46000	1.0247
0.7192	1.9797	47000	1.0170
0.8496	2.0218	48000	1.0374
0.7649	2.0639	49000	1.0333
0.7686	2.1060	50000	1.0282
0.6953	2.1481	51000	1.0420
0.8024	2.1903	52000	1.0333
0.823	2.2324	53000	1.0265
0.6479	2.2745	54000	1.0156
0.7726	2.3166	55000	1.0142
0.7353	2.3587	56000	1.0093
0.6597	2.4009	57000	1.0133
0.8428	2.4430	58000	1.0154
0.7129	2.4851	59000	1.0113
0.6315	2.5272	60000	1.0110
0.7019	2.5693	61000	1.0101
0.722	2.6115	62000	0.9969
0.8638	2.6536	63000	0.9973
0.9256	2.6957	64000	0.9930
0.6812	2.7378	65000	0.9942
0.7772	2.7799	66000	0.9978
0.6935	2.8221	67000	0.9807
0.7865	2.8642	68000	0.9797
0.758	2.9063	69000	0.9857
0.8565	2.9484	70000	0.9729
0.7374	2.9905	71000	0.9722
0.6262	3.0327	72000	1.0030
0.6128	3.0748	73000	1.0083
0.587	3.1169	74000	1.0000
0.6267	3.1590	75000	1.0184
0.5878	3.2011	76000	1.0107
0.6382	3.2433	77000	1.0090
0.738	3.2854	78000	1.0005
0.6962	3.3275	79000	1.0091
0.6249	3.3696	80000	1.0081
0.6458	3.4117	81000	1.0061
0.6177	3.4539	82000	1.0010
0.6046	3.4960	83000	1.0046
0.6263	3.5381	84000	1.0010
0.6269	3.5802	85000	0.9995
0.6503	3.6223	86000	1.0010
0.6702	3.6645	87000	0.9906
0.6865	3.7066	88000	0.9858
0.5789	3.7487	89000	0.9858
0.6636	3.7908	90000	0.9817
0.622	3.8330	91000	0.9871
0.5741	3.8751	92000	0.9849
0.6681	3.9172	93000	0.9766
0.6471	3.9593	94000	0.9739
0.5567	4.0014	95000	0.9776
0.5318	4.0436	96000	1.0329
0.5967	4.0857	97000	1.0379
0.5666	4.1278	98000	1.0377
0.5573	4.1699	99000	1.0397
0.533	4.2120	100000	1.0324
0.5331	4.2542	101000	1.0346
0.5689	4.2963	102000	1.0376
0.5983	4.3384	103000	1.0354
0.5405	4.3805	104000	1.0281
0.5718	4.4226	105000	1.0357
0.5416	4.4648	106000	1.0303
0.5482	4.5069	107000	1.0312
0.5459	4.5490	108000	1.0268
0.563	4.5911	109000	1.0300
0.549	4.6332	110000	1.0277
0.5049	4.6754	111000	1.0290
0.593	4.7175	112000	1.0259
0.5144	4.7596	113000	1.0240
0.6079	4.8017	114000	1.0242
0.4864	4.8438	115000	1.0257
0.5388	4.8860	116000	1.0257
0.5368	4.9281	117000	1.0264
0.4607	4.9702	118000	1.0245

Framework versions

Transformers 4.46.2
Pytorch 2.5.1+cu124
Datasets 3.1.0
Tokenizers 0.20.3

2ndBestKiller
/

tinyllama-icd_qa_5q_all_or_nothing

tinyllama-icd_qa_5q_all_or_nothing

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for 2ndBestKiller/tinyllama-icd_qa_5q_all_or_nothing

Evaluation results