llama3.1-8B-eeszt-structured

This model is a fine-tuned version of meta-llama/Llama-3.1-8B on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3304

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Use paged_adamw_32bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • training_steps: 500
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 0.8889 4 1.6790
2.1272 1.8333 8 1.5754
1.8869 2.7778 12 1.4449
1.8458 3.9444 17 1.3113
1.5497 4.8889 21 1.2161
1.4996 5.8333 25 1.1479
1.4996 6.7778 29 1.0829
1.5 7.9444 34 1.0096
1.1576 8.8889 38 0.9470
1.1188 9.8333 42 0.9070
0.881 10.7778 46 0.8688
0.9199 11.9444 51 0.8224
0.7161 12.8889 55 0.7994
0.7161 13.8333 59 0.7957
0.7983 14.7778 63 0.7891
0.5833 15.9444 68 0.7692
0.5577 16.8889 72 0.7593
0.4911 17.8333 76 0.7867
0.4478 18.7778 80 0.8088
0.5181 19.9444 85 0.8089
0.5181 20.8889 89 0.7761
0.3977 21.8333 93 0.7940
0.3655 22.7778 97 0.8387
0.293 23.9444 102 0.8603
0.2978 24.8889 106 0.8603
0.2573 25.8333 110 0.8431
0.2573 26.7778 114 0.9431
0.2802 27.9444 119 0.9213
0.2116 28.8889 123 0.9327
0.208 29.8333 127 0.9562
0.2012 30.7778 131 0.9036
0.1807 31.9444 136 0.9352
0.1885 32.8889 140 1.0403
0.1885 33.8333 144 0.9444
0.1898 34.7778 148 0.9924
0.1504 35.9444 153 1.0616
0.14 36.8889 157 0.9799
0.1428 37.8333 161 1.0503
0.1174 38.7778 165 1.0565
0.1513 39.9444 170 1.0090
0.1513 40.8889 174 1.0892
0.1053 41.8333 178 1.0162
0.1056 42.7778 182 1.1173
0.1127 43.9444 187 1.0811
0.0927 44.8889 191 1.0970
0.0963 45.8333 195 1.0959
0.0963 46.7778 199 1.0603
0.1043 47.9444 204 1.1082
0.0845 48.8889 208 1.0794
0.0728 49.8333 212 1.1056
0.0779 50.7778 216 1.1265
0.0706 51.9444 221 1.1261
0.06 52.8889 225 1.1191
0.06 53.8333 229 1.1820
0.0692 54.7778 233 1.1651
0.0558 55.9444 238 1.1954
0.0529 56.8889 242 1.1271
0.054 57.8333 246 1.0981
0.0491 58.7778 250 1.1937
0.0588 59.9444 255 1.1734
0.0588 60.8889 259 1.2405
0.0435 61.8333 263 1.1687
0.0394 62.7778 267 1.1928
0.0446 63.9444 272 1.2214
0.0414 64.8889 276 1.2216
0.0378 65.8333 280 1.2238
0.0378 66.7778 284 1.2372
0.0455 67.9444 289 1.2214
0.0377 68.8889 293 1.2555
0.0327 69.8333 297 1.2370
0.033 70.7778 301 1.2383
0.0342 71.9444 306 1.2499
0.032 72.8889 310 1.2769
0.032 73.8333 314 1.2521
0.0389 74.7778 318 1.2544
0.0312 75.9444 323 1.2710
0.0294 76.8889 327 1.2853
0.0269 77.8333 331 1.2947
0.028 78.7778 335 1.3076
0.0334 79.9444 340 1.3095
0.0334 80.8889 344 1.2938
0.0257 81.8333 348 1.2813
0.0265 82.7778 352 1.2840
0.0262 83.9444 357 1.2902
0.0243 84.8889 361 1.3001
0.0232 85.8333 365 1.3042
0.0232 86.7778 369 1.3044
0.027 87.9444 374 1.2909
0.0224 88.8889 378 1.2925
0.0239 89.8333 382 1.2949
0.0221 90.7778 386 1.3046
0.0244 91.9444 391 1.3120
0.0256 92.8889 395 1.3179
0.0256 93.8333 399 1.3150
0.0276 94.7778 403 1.3069
0.0226 95.9444 408 1.2978
0.0279 96.8889 412 1.2995
0.0218 97.8333 416 1.3054
0.0224 98.7778 420 1.3163
0.0236 99.9444 425 1.3296
0.0236 100.8889 429 1.3317
0.021 101.8333 433 1.3305
0.0208 102.7778 437 1.3273
0.0205 103.9444 442 1.3253
0.0213 104.8889 446 1.3249
0.0208 105.8333 450 1.3257
0.0208 106.7778 454 1.3263
0.0221 107.9444 459 1.3271
0.0223 108.8889 463 1.3279
0.0194 109.8333 467 1.3291
0.0207 110.7778 471 1.3293
0.0211 111.9444 476 1.3296
0.0193 112.8889 480 1.3302
0.0193 113.8333 484 1.3301
0.0217 114.7778 488 1.3295
0.0201 115.9444 493 1.3301
0.0201 116.8889 497 1.3305
0.0201 117.6111 500 1.3304

Framework versions

  • PEFT 0.13.2
  • Transformers 4.46.1
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.2
  • Tokenizers 0.20.1
Downloads last month
9
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for aborcs/llama3.1-8B-eeszt-structured

Adapter
(147)
this model