metadata
library_name: peft
license: other
base_model: unsloth/Llama-3.2-3B-Instruct
tags:
- llama-factory
- lora
- unsloth
- generated_from_trainer
model-index:
- name: llm3br256
results: []
llm3br256
This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the dbischof_premise_aea dataset. It achieves the following results on the evaluation set:
- Loss: 0.0136
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 5.0
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.0896 | 0.0387 | 5 | 0.0767 |
0.057 | 0.0774 | 10 | 0.0397 |
0.0361 | 0.1162 | 15 | 0.0325 |
0.0478 | 0.1549 | 20 | 0.0304 |
0.0293 | 0.1936 | 25 | 0.0270 |
0.0429 | 0.2323 | 30 | 0.0253 |
0.0368 | 0.2711 | 35 | 0.0244 |
0.0323 | 0.3098 | 40 | 0.0229 |
0.0223 | 0.3485 | 45 | 0.0225 |
0.0327 | 0.3872 | 50 | 0.0216 |
0.0237 | 0.4259 | 55 | 0.0209 |
0.0255 | 0.4647 | 60 | 0.0204 |
0.0237 | 0.5034 | 65 | 0.0197 |
0.0273 | 0.5421 | 70 | 0.0197 |
0.0192 | 0.5808 | 75 | 0.0192 |
0.0459 | 0.6196 | 80 | 0.0188 |
0.0203 | 0.6583 | 85 | 0.0185 |
0.032 | 0.6970 | 90 | 0.0183 |
0.0145 | 0.7357 | 95 | 0.0184 |
0.0299 | 0.7744 | 100 | 0.0181 |
0.0186 | 0.8132 | 105 | 0.0183 |
0.0255 | 0.8519 | 110 | 0.0178 |
0.0199 | 0.8906 | 115 | 0.0177 |
0.0216 | 0.9293 | 120 | 0.0173 |
0.024 | 0.9681 | 125 | 0.0176 |
0.0319 | 1.0068 | 130 | 0.0173 |
0.0202 | 1.0455 | 135 | 0.0176 |
0.0167 | 1.0842 | 140 | 0.0171 |
0.0205 | 1.1229 | 145 | 0.0168 |
0.0164 | 1.1617 | 150 | 0.0167 |
0.0303 | 1.2004 | 155 | 0.0168 |
0.0201 | 1.2391 | 160 | 0.0165 |
0.0183 | 1.2778 | 165 | 0.0164 |
0.0221 | 1.3166 | 170 | 0.0163 |
0.0132 | 1.3553 | 175 | 0.0162 |
0.0226 | 1.3940 | 180 | 0.0158 |
0.0173 | 1.4327 | 185 | 0.0159 |
0.0304 | 1.4714 | 190 | 0.0164 |
0.0177 | 1.5102 | 195 | 0.0161 |
0.0155 | 1.5489 | 200 | 0.0160 |
0.0258 | 1.5876 | 205 | 0.0159 |
0.0217 | 1.6263 | 210 | 0.0163 |
0.0197 | 1.6651 | 215 | 0.0161 |
0.0124 | 1.7038 | 220 | 0.0158 |
0.0248 | 1.7425 | 225 | 0.0156 |
0.017 | 1.7812 | 230 | 0.0159 |
0.0248 | 1.8199 | 235 | 0.0158 |
0.0189 | 1.8587 | 240 | 0.0155 |
0.0185 | 1.8974 | 245 | 0.0151 |
0.0154 | 1.9361 | 250 | 0.0151 |
0.0223 | 1.9748 | 255 | 0.0152 |
0.0161 | 2.0136 | 260 | 0.0152 |
0.0139 | 2.0523 | 265 | 0.0154 |
0.0173 | 2.0910 | 270 | 0.0153 |
0.0237 | 2.1297 | 275 | 0.0152 |
0.0167 | 2.1684 | 280 | 0.0151 |
0.0086 | 2.2072 | 285 | 0.0149 |
0.012 | 2.2459 | 290 | 0.0147 |
0.015 | 2.2846 | 295 | 0.0149 |
0.0165 | 2.3233 | 300 | 0.0151 |
0.0183 | 2.3621 | 305 | 0.0150 |
0.0233 | 2.4008 | 310 | 0.0151 |
0.0163 | 2.4395 | 315 | 0.0149 |
0.0121 | 2.4782 | 320 | 0.0147 |
0.0213 | 2.5169 | 325 | 0.0145 |
0.0253 | 2.5557 | 330 | 0.0145 |
0.023 | 2.5944 | 335 | 0.0149 |
0.014 | 2.6331 | 340 | 0.0144 |
0.0156 | 2.6718 | 345 | 0.0145 |
0.0164 | 2.7106 | 350 | 0.0143 |
0.0262 | 2.7493 | 355 | 0.0140 |
0.0134 | 2.7880 | 360 | 0.0142 |
0.018 | 2.8267 | 365 | 0.0144 |
0.0166 | 2.8654 | 370 | 0.0145 |
0.0204 | 2.9042 | 375 | 0.0141 |
0.0284 | 2.9429 | 380 | 0.0139 |
0.021 | 2.9816 | 385 | 0.0139 |
0.0125 | 3.0203 | 390 | 0.0145 |
0.0157 | 3.0591 | 395 | 0.0145 |
0.0136 | 3.0978 | 400 | 0.0142 |
0.0087 | 3.1365 | 405 | 0.0141 |
0.0217 | 3.1752 | 410 | 0.0139 |
0.0125 | 3.2139 | 415 | 0.0136 |
0.0115 | 3.2527 | 420 | 0.0138 |
0.0128 | 3.2914 | 425 | 0.0139 |
0.0278 | 3.3301 | 430 | 0.0138 |
0.0197 | 3.3688 | 435 | 0.0136 |
0.0095 | 3.4076 | 440 | 0.0133 |
0.0075 | 3.4463 | 445 | 0.0133 |
0.0112 | 3.4850 | 450 | 0.0136 |
0.0129 | 3.5237 | 455 | 0.0137 |
0.011 | 3.5624 | 460 | 0.0136 |
0.0233 | 3.6012 | 465 | 0.0136 |
0.0132 | 3.6399 | 470 | 0.0134 |
0.0147 | 3.6786 | 475 | 0.0136 |
0.0073 | 3.7173 | 480 | 0.0136 |
0.0143 | 3.7561 | 485 | 0.0136 |
0.0086 | 3.7948 | 490 | 0.0137 |
0.0055 | 3.8335 | 495 | 0.0138 |
0.0108 | 3.8722 | 500 | 0.0138 |
0.0079 | 3.9109 | 505 | 0.0136 |
0.0105 | 3.9497 | 510 | 0.0133 |
0.0117 | 3.9884 | 515 | 0.0133 |
0.008 | 4.0271 | 520 | 0.0135 |
0.0147 | 4.0658 | 525 | 0.0137 |
0.007 | 4.1045 | 530 | 0.0143 |
0.0059 | 4.1433 | 535 | 0.0146 |
0.015 | 4.1820 | 540 | 0.0144 |
0.0121 | 4.2207 | 545 | 0.0142 |
0.0113 | 4.2594 | 550 | 0.0140 |
0.0068 | 4.2982 | 555 | 0.0140 |
0.0095 | 4.3369 | 560 | 0.0140 |
0.0149 | 4.3756 | 565 | 0.0141 |
0.0063 | 4.4143 | 570 | 0.0141 |
0.0073 | 4.4530 | 575 | 0.0141 |
0.0114 | 4.4918 | 580 | 0.0142 |
0.0064 | 4.5305 | 585 | 0.0142 |
0.011 | 4.5692 | 590 | 0.0142 |
0.0088 | 4.6079 | 595 | 0.0142 |
0.0049 | 4.6467 | 600 | 0.0142 |
0.0079 | 4.6854 | 605 | 0.0142 |
0.0061 | 4.7241 | 610 | 0.0142 |
0.012 | 4.7628 | 615 | 0.0142 |
0.0107 | 4.8015 | 620 | 0.0142 |
0.0104 | 4.8403 | 625 | 0.0142 |
0.0117 | 4.8790 | 630 | 0.0142 |
0.013 | 4.9177 | 635 | 0.0142 |
0.0079 | 4.9564 | 640 | 0.0142 |
0.0082 | 4.9952 | 645 | 0.0142 |
Framework versions
- PEFT 0.12.0
- Transformers 4.46.1
- Pytorch 2.4.0+cu121
- Datasets 3.1.0
- Tokenizers 0.20.3