--- library_name: peft license: other base_model: unsloth/Llama-3.2-3B-Instruct tags: - llama-factory - lora - unsloth - generated_from_trainer model-index: - name: llm3br256 results: [] --- # llm3br256 This model is a fine-tuned version of [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) on the dbischof_premise_aea dataset. It achieves the following results on the evaluation set: - Loss: 0.0136 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0001 - train_batch_size: 4 - eval_batch_size: 4 - seed: 42 - gradient_accumulation_steps: 8 - total_train_batch_size: 32 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 5.0 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 0.0896 | 0.0387 | 5 | 0.0767 | | 0.057 | 0.0774 | 10 | 0.0397 | | 0.0361 | 0.1162 | 15 | 0.0325 | | 0.0478 | 0.1549 | 20 | 0.0304 | | 0.0293 | 0.1936 | 25 | 0.0270 | | 0.0429 | 0.2323 | 30 | 0.0253 | | 0.0368 | 0.2711 | 35 | 0.0244 | | 0.0323 | 0.3098 | 40 | 0.0229 | | 0.0223 | 0.3485 | 45 | 0.0225 | | 0.0327 | 0.3872 | 50 | 0.0216 | | 0.0237 | 0.4259 | 55 | 0.0209 | | 0.0255 | 0.4647 | 60 | 0.0204 | | 0.0237 | 0.5034 | 65 | 0.0197 | | 0.0273 | 0.5421 | 70 | 0.0197 | | 0.0192 | 0.5808 | 75 | 0.0192 | | 0.0459 | 0.6196 | 80 | 0.0188 | | 0.0203 | 0.6583 | 85 | 0.0185 | | 0.032 | 0.6970 | 90 | 0.0183 | | 0.0145 | 0.7357 | 95 | 0.0184 | | 0.0299 | 0.7744 | 100 | 0.0181 | | 0.0186 | 0.8132 | 105 | 0.0183 | | 0.0255 | 0.8519 | 110 | 0.0178 | | 0.0199 | 0.8906 | 115 | 0.0177 | | 0.0216 | 0.9293 | 120 | 0.0173 | | 0.024 | 0.9681 | 125 | 0.0176 | | 0.0319 | 1.0068 | 130 | 0.0173 | | 0.0202 | 1.0455 | 135 | 0.0176 | | 0.0167 | 1.0842 | 140 | 0.0171 | | 0.0205 | 1.1229 | 145 | 0.0168 | | 0.0164 | 1.1617 | 150 | 0.0167 | | 0.0303 | 1.2004 | 155 | 0.0168 | | 0.0201 | 1.2391 | 160 | 0.0165 | | 0.0183 | 1.2778 | 165 | 0.0164 | | 0.0221 | 1.3166 | 170 | 0.0163 | | 0.0132 | 1.3553 | 175 | 0.0162 | | 0.0226 | 1.3940 | 180 | 0.0158 | | 0.0173 | 1.4327 | 185 | 0.0159 | | 0.0304 | 1.4714 | 190 | 0.0164 | | 0.0177 | 1.5102 | 195 | 0.0161 | | 0.0155 | 1.5489 | 200 | 0.0160 | | 0.0258 | 1.5876 | 205 | 0.0159 | | 0.0217 | 1.6263 | 210 | 0.0163 | | 0.0197 | 1.6651 | 215 | 0.0161 | | 0.0124 | 1.7038 | 220 | 0.0158 | | 0.0248 | 1.7425 | 225 | 0.0156 | | 0.017 | 1.7812 | 230 | 0.0159 | | 0.0248 | 1.8199 | 235 | 0.0158 | | 0.0189 | 1.8587 | 240 | 0.0155 | | 0.0185 | 1.8974 | 245 | 0.0151 | | 0.0154 | 1.9361 | 250 | 0.0151 | | 0.0223 | 1.9748 | 255 | 0.0152 | | 0.0161 | 2.0136 | 260 | 0.0152 | | 0.0139 | 2.0523 | 265 | 0.0154 | | 0.0173 | 2.0910 | 270 | 0.0153 | | 0.0237 | 2.1297 | 275 | 0.0152 | | 0.0167 | 2.1684 | 280 | 0.0151 | | 0.0086 | 2.2072 | 285 | 0.0149 | | 0.012 | 2.2459 | 290 | 0.0147 | | 0.015 | 2.2846 | 295 | 0.0149 | | 0.0165 | 2.3233 | 300 | 0.0151 | | 0.0183 | 2.3621 | 305 | 0.0150 | | 0.0233 | 2.4008 | 310 | 0.0151 | | 0.0163 | 2.4395 | 315 | 0.0149 | | 0.0121 | 2.4782 | 320 | 0.0147 | | 0.0213 | 2.5169 | 325 | 0.0145 | | 0.0253 | 2.5557 | 330 | 0.0145 | | 0.023 | 2.5944 | 335 | 0.0149 | | 0.014 | 2.6331 | 340 | 0.0144 | | 0.0156 | 2.6718 | 345 | 0.0145 | | 0.0164 | 2.7106 | 350 | 0.0143 | | 0.0262 | 2.7493 | 355 | 0.0140 | | 0.0134 | 2.7880 | 360 | 0.0142 | | 0.018 | 2.8267 | 365 | 0.0144 | | 0.0166 | 2.8654 | 370 | 0.0145 | | 0.0204 | 2.9042 | 375 | 0.0141 | | 0.0284 | 2.9429 | 380 | 0.0139 | | 0.021 | 2.9816 | 385 | 0.0139 | | 0.0125 | 3.0203 | 390 | 0.0145 | | 0.0157 | 3.0591 | 395 | 0.0145 | | 0.0136 | 3.0978 | 400 | 0.0142 | | 0.0087 | 3.1365 | 405 | 0.0141 | | 0.0217 | 3.1752 | 410 | 0.0139 | | 0.0125 | 3.2139 | 415 | 0.0136 | | 0.0115 | 3.2527 | 420 | 0.0138 | | 0.0128 | 3.2914 | 425 | 0.0139 | | 0.0278 | 3.3301 | 430 | 0.0138 | | 0.0197 | 3.3688 | 435 | 0.0136 | | 0.0095 | 3.4076 | 440 | 0.0133 | | 0.0075 | 3.4463 | 445 | 0.0133 | | 0.0112 | 3.4850 | 450 | 0.0136 | | 0.0129 | 3.5237 | 455 | 0.0137 | | 0.011 | 3.5624 | 460 | 0.0136 | | 0.0233 | 3.6012 | 465 | 0.0136 | | 0.0132 | 3.6399 | 470 | 0.0134 | | 0.0147 | 3.6786 | 475 | 0.0136 | | 0.0073 | 3.7173 | 480 | 0.0136 | | 0.0143 | 3.7561 | 485 | 0.0136 | | 0.0086 | 3.7948 | 490 | 0.0137 | | 0.0055 | 3.8335 | 495 | 0.0138 | | 0.0108 | 3.8722 | 500 | 0.0138 | | 0.0079 | 3.9109 | 505 | 0.0136 | | 0.0105 | 3.9497 | 510 | 0.0133 | | 0.0117 | 3.9884 | 515 | 0.0133 | | 0.008 | 4.0271 | 520 | 0.0135 | | 0.0147 | 4.0658 | 525 | 0.0137 | | 0.007 | 4.1045 | 530 | 0.0143 | | 0.0059 | 4.1433 | 535 | 0.0146 | | 0.015 | 4.1820 | 540 | 0.0144 | | 0.0121 | 4.2207 | 545 | 0.0142 | | 0.0113 | 4.2594 | 550 | 0.0140 | | 0.0068 | 4.2982 | 555 | 0.0140 | | 0.0095 | 4.3369 | 560 | 0.0140 | | 0.0149 | 4.3756 | 565 | 0.0141 | | 0.0063 | 4.4143 | 570 | 0.0141 | | 0.0073 | 4.4530 | 575 | 0.0141 | | 0.0114 | 4.4918 | 580 | 0.0142 | | 0.0064 | 4.5305 | 585 | 0.0142 | | 0.011 | 4.5692 | 590 | 0.0142 | | 0.0088 | 4.6079 | 595 | 0.0142 | | 0.0049 | 4.6467 | 600 | 0.0142 | | 0.0079 | 4.6854 | 605 | 0.0142 | | 0.0061 | 4.7241 | 610 | 0.0142 | | 0.012 | 4.7628 | 615 | 0.0142 | | 0.0107 | 4.8015 | 620 | 0.0142 | | 0.0104 | 4.8403 | 625 | 0.0142 | | 0.0117 | 4.8790 | 630 | 0.0142 | | 0.013 | 4.9177 | 635 | 0.0142 | | 0.0079 | 4.9564 | 640 | 0.0142 | | 0.0082 | 4.9952 | 645 | 0.0142 | ### Framework versions - PEFT 0.12.0 - Transformers 4.46.1 - Pytorch 2.4.0+cu121 - Datasets 3.1.0 - Tokenizers 0.20.3