smolchess-v2 / README.md
nlpguy's picture
End of training
81df4a1 verified
|
raw
history blame
6.49 kB
metadata
library_name: transformers
license: apache-2.0
base_model: HuggingFaceTB/SmolLM2-135M
tags:
  - generated_from_trainer
model-index:
  - name: smolchess-v2
    results: []

smolchess-v2

This model is a fine-tuned version of HuggingFaceTB/SmolLM2-135M on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8569

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 4e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use grokadamw with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • num_epochs: 0.25

Training results

Training Loss Epoch Step Validation Loss
1.4864 0.0025 4 1.5472
1.3163 0.0050 8 1.2616
1.0354 0.0075 12 1.1857
1.2466 0.0100 16 1.1447
1.1801 0.0125 20 1.1176
1.208 0.0150 24 1.1092
1.0723 0.0176 28 1.0780
1.1895 0.0201 32 1.0760
1.1358 0.0226 36 1.0562
1.0817 0.0251 40 1.0554
0.9674 0.0276 44 1.0419
0.9832 0.0301 48 1.0245
1.0241 0.0326 52 1.0178
0.9553 0.0351 56 1.0115
1.0715 0.0376 60 1.0027
1.1014 0.0401 64 0.9965
1.0304 0.0426 68 0.9954
0.9906 0.0451 72 0.9879
0.9631 0.0476 76 0.9769
0.986 0.0502 80 0.9720
1.0233 0.0527 84 0.9675
0.9323 0.0552 88 0.9612
0.9303 0.0577 92 0.9569
1.0258 0.0602 96 0.9520
0.9946 0.0627 100 0.9527
0.9568 0.0652 104 0.9425
0.9674 0.0677 108 0.9435
0.9627 0.0702 112 0.9378
0.9755 0.0727 116 0.9338
0.8511 0.0752 120 0.9306
0.989 0.0777 124 0.9292
0.9635 0.0803 128 0.9272
0.9412 0.0828 132 0.9263
0.8605 0.0853 136 0.9228
0.8503 0.0878 140 0.9206
0.8976 0.0903 144 0.9155
0.9029 0.0928 148 0.9143
0.9335 0.0953 152 0.9103
0.944 0.0978 156 0.9073
0.8948 0.1003 160 0.9058
0.8921 0.1028 164 0.9032
0.9948 0.1053 168 0.9028
0.8968 0.1078 172 0.9003
0.8908 0.1103 176 0.8982
0.9119 0.1129 180 0.8979
0.842 0.1154 184 0.8942
0.7497 0.1179 188 0.8930
0.9294 0.1204 192 0.8922
0.8184 0.1229 196 0.8891
0.941 0.1254 200 0.8883
0.8884 0.1279 204 0.8851
0.8975 0.1304 208 0.8851
0.9205 0.1329 212 0.8847
0.8663 0.1354 216 0.8815
0.8455 0.1379 220 0.8812
0.921 0.1404 224 0.8794
0.9493 0.1429 228 0.8784
0.8949 0.1455 232 0.8792
0.8886 0.1480 236 0.8773
0.8808 0.1505 240 0.8760
0.8768 0.1530 244 0.8750
0.9354 0.1555 248 0.8727
0.8512 0.1580 252 0.8721
0.8355 0.1605 256 0.8717
0.7923 0.1630 260 0.8699
0.9027 0.1655 264 0.8691
0.8264 0.1680 268 0.8681
0.9199 0.1705 272 0.8683
0.8792 0.1730 276 0.8666
0.9347 0.1755 280 0.8664
0.8988 0.1781 284 0.8652
0.889 0.1806 288 0.8646
0.917 0.1831 292 0.8633
0.9206 0.1856 296 0.8628
0.9127 0.1881 300 0.8629
0.6946 0.1906 304 0.8618
0.9499 0.1931 308 0.8612
0.8798 0.1956 312 0.8610
0.8857 0.1981 316 0.8610
0.9356 0.2006 320 0.8604
0.8134 0.2031 324 0.8597
0.9214 0.2056 328 0.8592
0.8907 0.2082 332 0.8590
0.8309 0.2107 336 0.8588
0.8386 0.2132 340 0.8584
0.8001 0.2157 344 0.8583
0.8452 0.2182 348 0.8580
0.7587 0.2207 352 0.8578
0.8155 0.2232 356 0.8576
0.7179 0.2257 360 0.8575
0.8231 0.2282 364 0.8573
0.8984 0.2307 368 0.8572
0.8501 0.2332 372 0.8571
0.8512 0.2357 376 0.8570
0.8554 0.2382 380 0.8570
0.9082 0.2408 384 0.8570
0.8617 0.2433 388 0.8569
0.8845 0.2458 392 0.8569
0.9595 0.2483 396 0.8569

Framework versions

  • Transformers 4.46.1
  • Pytorch 2.5.0+cu121
  • Datasets 3.1.0
  • Tokenizers 0.20.1