llm3br256-v1.5

This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the asianpaints dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0157

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 48
  • eval_batch_size: 48
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 25

Training results

Training Loss Epoch Step Validation Loss
0.1302 0.1208 25 0.1429
0.0909 0.2415 50 0.0968
0.062 0.3623 75 0.0748
0.0602 0.4831 100 0.0608
0.057 0.6039 125 0.0552
0.0456 0.7246 150 0.0501
0.0432 0.8454 175 0.0470
0.0416 0.9662 200 0.0447
0.0441 1.0870 225 0.0428
0.0348 1.2077 250 0.0405
0.0355 1.3285 275 0.0386
0.0379 1.4493 300 0.0358
0.032 1.5700 325 0.0354
0.0342 1.6908 350 0.0335
0.0318 1.8116 375 0.0324
0.031 1.9324 400 0.0318
0.0283 2.0531 425 0.0321
0.0275 2.1739 450 0.0337
0.026 2.2947 475 0.0314
0.0244 2.4155 500 0.0285
0.0281 2.5362 525 0.0285
0.0212 2.6570 550 0.0268
0.0221 2.7778 575 0.0267
0.0225 2.8986 600 0.0266
0.0264 3.0193 625 0.0292
0.0196 3.1401 650 0.0280
0.0185 3.2609 675 0.0264
0.0161 3.3816 700 0.0248
0.0186 3.5024 725 0.0226
0.0166 3.6232 750 0.0213
0.0141 3.7440 775 0.0215
0.0186 3.8647 800 0.0211
0.0119 3.9855 825 0.0204
0.0097 4.1063 850 0.0210
0.0095 4.2271 875 0.0204
0.0119 4.3478 900 0.0207
0.0131 4.4686 925 0.0257
0.0123 4.5894 950 0.0228
0.0133 4.7101 975 0.0204
0.0115 4.8309 1000 0.0191
0.0152 4.9517 1025 0.0201
0.0075 5.0725 1050 0.0188
0.0069 5.1932 1075 0.0169
0.0073 5.3140 1100 0.0182
0.0076 5.4348 1125 0.0166
0.0084 5.5556 1150 0.0173
0.0091 5.6763 1175 0.0175
0.0081 5.7971 1200 0.0176
0.0071 5.9179 1225 0.0175
0.0058 6.0386 1250 0.0187
0.0081 6.1594 1275 0.0165
0.0057 6.2802 1300 0.0171
0.0068 6.4010 1325 0.0165
0.0059 6.5217 1350 0.0163
0.0057 6.6425 1375 0.0151
0.0061 6.7633 1400 0.0164
0.006 6.8841 1425 0.0156
0.0062 7.0048 1450 0.0161
0.006 7.1256 1475 0.0178
0.0059 7.2464 1500 0.0169
0.0043 7.3671 1525 0.0175
0.0049 7.4879 1550 0.0178
0.0058 7.6087 1575 0.0156
0.0062 7.7295 1600 0.0158
0.0045 7.8502 1625 0.0151
0.0054 7.9710 1650 0.0150
0.0042 8.0918 1675 0.0157
0.0039 8.2126 1700 0.0157
0.0046 8.3333 1725 0.0170
0.0025 8.4541 1750 0.0154
0.0047 8.5749 1775 0.0156
0.0044 8.6957 1800 0.0166
0.0031 8.8164 1825 0.0172
0.0029 8.9372 1850 0.0167
0.0032 9.0580 1875 0.0169
0.0036 9.1787 1900 0.0167

Framework versions

  • PEFT 0.12.0
  • Transformers 4.46.1
  • Pytorch 2.4.0+cu121
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
7
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.