tinyllama-icd_qa_5q_all_or_nothing
This model is a fine-tuned version of TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T on the None dataset. It achieves the following results on the evaluation set:
- Loss: 1.0245
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 2
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 5
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
1.897 | 0.0421 | 1000 | 2.0673 |
1.8168 | 0.0842 | 2000 | 1.8269 |
1.8585 | 0.1264 | 3000 | 1.7420 |
1.4869 | 0.1685 | 4000 | 1.6427 |
1.4204 | 0.2106 | 5000 | 1.5763 |
1.2253 | 0.2527 | 6000 | 1.5298 |
1.326 | 0.2948 | 7000 | 1.4959 |
1.5507 | 0.3370 | 8000 | 1.4615 |
1.2435 | 0.3791 | 9000 | 1.4142 |
1.5801 | 0.4212 | 10000 | 1.3806 |
1.1879 | 0.4633 | 11000 | 1.3511 |
1.2121 | 0.5054 | 12000 | 1.3304 |
1.0262 | 0.5476 | 13000 | 1.3335 |
1.0754 | 0.5897 | 14000 | 1.2921 |
1.0264 | 0.6318 | 15000 | 1.2785 |
1.0067 | 0.6739 | 16000 | 1.2544 |
1.1532 | 0.7160 | 17000 | 1.2323 |
1.1084 | 0.7582 | 18000 | 1.2354 |
1.1094 | 0.8003 | 19000 | 1.2106 |
0.9589 | 0.8424 | 20000 | 1.1987 |
0.9922 | 0.8845 | 21000 | 1.1946 |
1.0851 | 0.9266 | 22000 | 1.1723 |
1.0407 | 0.9688 | 23000 | 1.1528 |
0.7835 | 1.0109 | 24000 | 1.1521 |
0.9736 | 1.0530 | 25000 | 1.1451 |
0.9463 | 1.0951 | 26000 | 1.1424 |
0.9879 | 1.1372 | 27000 | 1.1394 |
0.8803 | 1.1794 | 28000 | 1.1301 |
0.9309 | 1.2215 | 29000 | 1.1380 |
0.9263 | 1.2636 | 30000 | 1.1157 |
0.975 | 1.3057 | 31000 | 1.1028 |
0.7661 | 1.3479 | 32000 | 1.0962 |
0.8526 | 1.3900 | 33000 | 1.0979 |
0.9277 | 1.4321 | 34000 | 1.0948 |
0.8177 | 1.4742 | 35000 | 1.0887 |
0.8935 | 1.5163 | 36000 | 1.0810 |
0.8534 | 1.5585 | 37000 | 1.0868 |
0.8604 | 1.6006 | 38000 | 1.0673 |
0.8426 | 1.6427 | 39000 | 1.0699 |
0.9027 | 1.6848 | 40000 | 1.0588 |
0.8062 | 1.7269 | 41000 | 1.0487 |
1.0168 | 1.7691 | 42000 | 1.0508 |
0.8437 | 1.8112 | 43000 | 1.0416 |
0.9178 | 1.8533 | 44000 | 1.0256 |
0.9543 | 1.8954 | 45000 | 1.0266 |
0.787 | 1.9375 | 46000 | 1.0247 |
0.7192 | 1.9797 | 47000 | 1.0170 |
0.8496 | 2.0218 | 48000 | 1.0374 |
0.7649 | 2.0639 | 49000 | 1.0333 |
0.7686 | 2.1060 | 50000 | 1.0282 |
0.6953 | 2.1481 | 51000 | 1.0420 |
0.8024 | 2.1903 | 52000 | 1.0333 |
0.823 | 2.2324 | 53000 | 1.0265 |
0.6479 | 2.2745 | 54000 | 1.0156 |
0.7726 | 2.3166 | 55000 | 1.0142 |
0.7353 | 2.3587 | 56000 | 1.0093 |
0.6597 | 2.4009 | 57000 | 1.0133 |
0.8428 | 2.4430 | 58000 | 1.0154 |
0.7129 | 2.4851 | 59000 | 1.0113 |
0.6315 | 2.5272 | 60000 | 1.0110 |
0.7019 | 2.5693 | 61000 | 1.0101 |
0.722 | 2.6115 | 62000 | 0.9969 |
0.8638 | 2.6536 | 63000 | 0.9973 |
0.9256 | 2.6957 | 64000 | 0.9930 |
0.6812 | 2.7378 | 65000 | 0.9942 |
0.7772 | 2.7799 | 66000 | 0.9978 |
0.6935 | 2.8221 | 67000 | 0.9807 |
0.7865 | 2.8642 | 68000 | 0.9797 |
0.758 | 2.9063 | 69000 | 0.9857 |
0.8565 | 2.9484 | 70000 | 0.9729 |
0.7374 | 2.9905 | 71000 | 0.9722 |
0.6262 | 3.0327 | 72000 | 1.0030 |
0.6128 | 3.0748 | 73000 | 1.0083 |
0.587 | 3.1169 | 74000 | 1.0000 |
0.6267 | 3.1590 | 75000 | 1.0184 |
0.5878 | 3.2011 | 76000 | 1.0107 |
0.6382 | 3.2433 | 77000 | 1.0090 |
0.738 | 3.2854 | 78000 | 1.0005 |
0.6962 | 3.3275 | 79000 | 1.0091 |
0.6249 | 3.3696 | 80000 | 1.0081 |
0.6458 | 3.4117 | 81000 | 1.0061 |
0.6177 | 3.4539 | 82000 | 1.0010 |
0.6046 | 3.4960 | 83000 | 1.0046 |
0.6263 | 3.5381 | 84000 | 1.0010 |
0.6269 | 3.5802 | 85000 | 0.9995 |
0.6503 | 3.6223 | 86000 | 1.0010 |
0.6702 | 3.6645 | 87000 | 0.9906 |
0.6865 | 3.7066 | 88000 | 0.9858 |
0.5789 | 3.7487 | 89000 | 0.9858 |
0.6636 | 3.7908 | 90000 | 0.9817 |
0.622 | 3.8330 | 91000 | 0.9871 |
0.5741 | 3.8751 | 92000 | 0.9849 |
0.6681 | 3.9172 | 93000 | 0.9766 |
0.6471 | 3.9593 | 94000 | 0.9739 |
0.5567 | 4.0014 | 95000 | 0.9776 |
0.5318 | 4.0436 | 96000 | 1.0329 |
0.5967 | 4.0857 | 97000 | 1.0379 |
0.5666 | 4.1278 | 98000 | 1.0377 |
0.5573 | 4.1699 | 99000 | 1.0397 |
0.533 | 4.2120 | 100000 | 1.0324 |
0.5331 | 4.2542 | 101000 | 1.0346 |
0.5689 | 4.2963 | 102000 | 1.0376 |
0.5983 | 4.3384 | 103000 | 1.0354 |
0.5405 | 4.3805 | 104000 | 1.0281 |
0.5718 | 4.4226 | 105000 | 1.0357 |
0.5416 | 4.4648 | 106000 | 1.0303 |
0.5482 | 4.5069 | 107000 | 1.0312 |
0.5459 | 4.5490 | 108000 | 1.0268 |
0.563 | 4.5911 | 109000 | 1.0300 |
0.549 | 4.6332 | 110000 | 1.0277 |
0.5049 | 4.6754 | 111000 | 1.0290 |
0.593 | 4.7175 | 112000 | 1.0259 |
0.5144 | 4.7596 | 113000 | 1.0240 |
0.6079 | 4.8017 | 114000 | 1.0242 |
0.4864 | 4.8438 | 115000 | 1.0257 |
0.5388 | 4.8860 | 116000 | 1.0257 |
0.5368 | 4.9281 | 117000 | 1.0264 |
0.4607 | 4.9702 | 118000 | 1.0245 |
Framework versions
- Transformers 4.46.2
- Pytorch 2.5.1+cu124
- Datasets 3.1.0
- Tokenizers 0.20.3
- Downloads last month
- 16
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.