smolchess-v2 / README.md

End of training

81df4a1 verified 5 days ago

6.49 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: HuggingFaceTB/SmolLM2-135M
	tags:
	- generated_from_trainer
	model-index:
	- name: smolchess-v2
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# smolchess-v2

	This model is a fine-tuned version of [HuggingFaceTB/SmolLM2-135M](https://huggingface.co/HuggingFaceTB/SmolLM2-135M) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.8569

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 4e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Use grokadamw with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: cosine
	- num_epochs: 0.25

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 1.4864 \| 0.0025 \| 4 \| 1.5472 \|
	\| 1.3163 \| 0.0050 \| 8 \| 1.2616 \|
	\| 1.0354 \| 0.0075 \| 12 \| 1.1857 \|
	\| 1.2466 \| 0.0100 \| 16 \| 1.1447 \|
	\| 1.1801 \| 0.0125 \| 20 \| 1.1176 \|
	\| 1.208 \| 0.0150 \| 24 \| 1.1092 \|
	\| 1.0723 \| 0.0176 \| 28 \| 1.0780 \|
	\| 1.1895 \| 0.0201 \| 32 \| 1.0760 \|
	\| 1.1358 \| 0.0226 \| 36 \| 1.0562 \|
	\| 1.0817 \| 0.0251 \| 40 \| 1.0554 \|
	\| 0.9674 \| 0.0276 \| 44 \| 1.0419 \|
	\| 0.9832 \| 0.0301 \| 48 \| 1.0245 \|
	\| 1.0241 \| 0.0326 \| 52 \| 1.0178 \|
	\| 0.9553 \| 0.0351 \| 56 \| 1.0115 \|
	\| 1.0715 \| 0.0376 \| 60 \| 1.0027 \|
	\| 1.1014 \| 0.0401 \| 64 \| 0.9965 \|
	\| 1.0304 \| 0.0426 \| 68 \| 0.9954 \|
	\| 0.9906 \| 0.0451 \| 72 \| 0.9879 \|
	\| 0.9631 \| 0.0476 \| 76 \| 0.9769 \|
	\| 0.986 \| 0.0502 \| 80 \| 0.9720 \|
	\| 1.0233 \| 0.0527 \| 84 \| 0.9675 \|
	\| 0.9323 \| 0.0552 \| 88 \| 0.9612 \|
	\| 0.9303 \| 0.0577 \| 92 \| 0.9569 \|
	\| 1.0258 \| 0.0602 \| 96 \| 0.9520 \|
	\| 0.9946 \| 0.0627 \| 100 \| 0.9527 \|
	\| 0.9568 \| 0.0652 \| 104 \| 0.9425 \|
	\| 0.9674 \| 0.0677 \| 108 \| 0.9435 \|
	\| 0.9627 \| 0.0702 \| 112 \| 0.9378 \|
	\| 0.9755 \| 0.0727 \| 116 \| 0.9338 \|
	\| 0.8511 \| 0.0752 \| 120 \| 0.9306 \|
	\| 0.989 \| 0.0777 \| 124 \| 0.9292 \|
	\| 0.9635 \| 0.0803 \| 128 \| 0.9272 \|
	\| 0.9412 \| 0.0828 \| 132 \| 0.9263 \|
	\| 0.8605 \| 0.0853 \| 136 \| 0.9228 \|
	\| 0.8503 \| 0.0878 \| 140 \| 0.9206 \|
	\| 0.8976 \| 0.0903 \| 144 \| 0.9155 \|
	\| 0.9029 \| 0.0928 \| 148 \| 0.9143 \|
	\| 0.9335 \| 0.0953 \| 152 \| 0.9103 \|
	\| 0.944 \| 0.0978 \| 156 \| 0.9073 \|
	\| 0.8948 \| 0.1003 \| 160 \| 0.9058 \|
	\| 0.8921 \| 0.1028 \| 164 \| 0.9032 \|
	\| 0.9948 \| 0.1053 \| 168 \| 0.9028 \|
	\| 0.8968 \| 0.1078 \| 172 \| 0.9003 \|
	\| 0.8908 \| 0.1103 \| 176 \| 0.8982 \|
	\| 0.9119 \| 0.1129 \| 180 \| 0.8979 \|
	\| 0.842 \| 0.1154 \| 184 \| 0.8942 \|
	\| 0.7497 \| 0.1179 \| 188 \| 0.8930 \|
	\| 0.9294 \| 0.1204 \| 192 \| 0.8922 \|
	\| 0.8184 \| 0.1229 \| 196 \| 0.8891 \|
	\| 0.941 \| 0.1254 \| 200 \| 0.8883 \|
	\| 0.8884 \| 0.1279 \| 204 \| 0.8851 \|
	\| 0.8975 \| 0.1304 \| 208 \| 0.8851 \|
	\| 0.9205 \| 0.1329 \| 212 \| 0.8847 \|
	\| 0.8663 \| 0.1354 \| 216 \| 0.8815 \|
	\| 0.8455 \| 0.1379 \| 220 \| 0.8812 \|
	\| 0.921 \| 0.1404 \| 224 \| 0.8794 \|
	\| 0.9493 \| 0.1429 \| 228 \| 0.8784 \|
	\| 0.8949 \| 0.1455 \| 232 \| 0.8792 \|
	\| 0.8886 \| 0.1480 \| 236 \| 0.8773 \|
	\| 0.8808 \| 0.1505 \| 240 \| 0.8760 \|
	\| 0.8768 \| 0.1530 \| 244 \| 0.8750 \|
	\| 0.9354 \| 0.1555 \| 248 \| 0.8727 \|
	\| 0.8512 \| 0.1580 \| 252 \| 0.8721 \|
	\| 0.8355 \| 0.1605 \| 256 \| 0.8717 \|
	\| 0.7923 \| 0.1630 \| 260 \| 0.8699 \|
	\| 0.9027 \| 0.1655 \| 264 \| 0.8691 \|
	\| 0.8264 \| 0.1680 \| 268 \| 0.8681 \|
	\| 0.9199 \| 0.1705 \| 272 \| 0.8683 \|
	\| 0.8792 \| 0.1730 \| 276 \| 0.8666 \|
	\| 0.9347 \| 0.1755 \| 280 \| 0.8664 \|
	\| 0.8988 \| 0.1781 \| 284 \| 0.8652 \|
	\| 0.889 \| 0.1806 \| 288 \| 0.8646 \|
	\| 0.917 \| 0.1831 \| 292 \| 0.8633 \|
	\| 0.9206 \| 0.1856 \| 296 \| 0.8628 \|
	\| 0.9127 \| 0.1881 \| 300 \| 0.8629 \|
	\| 0.6946 \| 0.1906 \| 304 \| 0.8618 \|
	\| 0.9499 \| 0.1931 \| 308 \| 0.8612 \|
	\| 0.8798 \| 0.1956 \| 312 \| 0.8610 \|
	\| 0.8857 \| 0.1981 \| 316 \| 0.8610 \|
	\| 0.9356 \| 0.2006 \| 320 \| 0.8604 \|
	\| 0.8134 \| 0.2031 \| 324 \| 0.8597 \|
	\| 0.9214 \| 0.2056 \| 328 \| 0.8592 \|
	\| 0.8907 \| 0.2082 \| 332 \| 0.8590 \|
	\| 0.8309 \| 0.2107 \| 336 \| 0.8588 \|
	\| 0.8386 \| 0.2132 \| 340 \| 0.8584 \|
	\| 0.8001 \| 0.2157 \| 344 \| 0.8583 \|
	\| 0.8452 \| 0.2182 \| 348 \| 0.8580 \|
	\| 0.7587 \| 0.2207 \| 352 \| 0.8578 \|
	\| 0.8155 \| 0.2232 \| 356 \| 0.8576 \|
	\| 0.7179 \| 0.2257 \| 360 \| 0.8575 \|
	\| 0.8231 \| 0.2282 \| 364 \| 0.8573 \|
	\| 0.8984 \| 0.2307 \| 368 \| 0.8572 \|
	\| 0.8501 \| 0.2332 \| 372 \| 0.8571 \|
	\| 0.8512 \| 0.2357 \| 376 \| 0.8570 \|
	\| 0.8554 \| 0.2382 \| 380 \| 0.8570 \|
	\| 0.9082 \| 0.2408 \| 384 \| 0.8570 \|
	\| 0.8617 \| 0.2433 \| 388 \| 0.8569 \|
	\| 0.8845 \| 0.2458 \| 392 \| 0.8569 \|
	\| 0.9595 \| 0.2483 \| 396 \| 0.8569 \|


	### Framework versions

	- Transformers 4.46.1
	- Pytorch 2.5.0+cu121
	- Datasets 3.1.0
	- Tokenizers 0.20.1

	---
	library_name: transformers
	license: apache-2.0
	base_model: HuggingFaceTB/SmolLM2-135M
	tags:
	- generated_from_trainer
	model-index:
	- name: smolchess-v2
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# smolchess-v2

	This model is a fine-tuned version of [HuggingFaceTB/SmolLM2-135M](https://huggingface.co/HuggingFaceTB/SmolLM2-135M) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.8569

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 4e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Use grokadamw with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: cosine
	- num_epochs: 0.25

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 1.4864 \| 0.0025 \| 4 \| 1.5472 \|
	\| 1.3163 \| 0.0050 \| 8 \| 1.2616 \|
	\| 1.0354 \| 0.0075 \| 12 \| 1.1857 \|
	\| 1.2466 \| 0.0100 \| 16 \| 1.1447 \|
	\| 1.1801 \| 0.0125 \| 20 \| 1.1176 \|
	\| 1.208 \| 0.0150 \| 24 \| 1.1092 \|
	\| 1.0723 \| 0.0176 \| 28 \| 1.0780 \|
	\| 1.1895 \| 0.0201 \| 32 \| 1.0760 \|
	\| 1.1358 \| 0.0226 \| 36 \| 1.0562 \|
	\| 1.0817 \| 0.0251 \| 40 \| 1.0554 \|
	\| 0.9674 \| 0.0276 \| 44 \| 1.0419 \|
	\| 0.9832 \| 0.0301 \| 48 \| 1.0245 \|
	\| 1.0241 \| 0.0326 \| 52 \| 1.0178 \|
	\| 0.9553 \| 0.0351 \| 56 \| 1.0115 \|
	\| 1.0715 \| 0.0376 \| 60 \| 1.0027 \|
	\| 1.1014 \| 0.0401 \| 64 \| 0.9965 \|
	\| 1.0304 \| 0.0426 \| 68 \| 0.9954 \|
	\| 0.9906 \| 0.0451 \| 72 \| 0.9879 \|
	\| 0.9631 \| 0.0476 \| 76 \| 0.9769 \|
	\| 0.986 \| 0.0502 \| 80 \| 0.9720 \|
	\| 1.0233 \| 0.0527 \| 84 \| 0.9675 \|
	\| 0.9323 \| 0.0552 \| 88 \| 0.9612 \|
	\| 0.9303 \| 0.0577 \| 92 \| 0.9569 \|
	\| 1.0258 \| 0.0602 \| 96 \| 0.9520 \|
	\| 0.9946 \| 0.0627 \| 100 \| 0.9527 \|
	\| 0.9568 \| 0.0652 \| 104 \| 0.9425 \|
	\| 0.9674 \| 0.0677 \| 108 \| 0.9435 \|
	\| 0.9627 \| 0.0702 \| 112 \| 0.9378 \|
	\| 0.9755 \| 0.0727 \| 116 \| 0.9338 \|
	\| 0.8511 \| 0.0752 \| 120 \| 0.9306 \|
	\| 0.989 \| 0.0777 \| 124 \| 0.9292 \|
	\| 0.9635 \| 0.0803 \| 128 \| 0.9272 \|
	\| 0.9412 \| 0.0828 \| 132 \| 0.9263 \|
	\| 0.8605 \| 0.0853 \| 136 \| 0.9228 \|
	\| 0.8503 \| 0.0878 \| 140 \| 0.9206 \|
	\| 0.8976 \| 0.0903 \| 144 \| 0.9155 \|
	\| 0.9029 \| 0.0928 \| 148 \| 0.9143 \|
	\| 0.9335 \| 0.0953 \| 152 \| 0.9103 \|
	\| 0.944 \| 0.0978 \| 156 \| 0.9073 \|
	\| 0.8948 \| 0.1003 \| 160 \| 0.9058 \|
	\| 0.8921 \| 0.1028 \| 164 \| 0.9032 \|
	\| 0.9948 \| 0.1053 \| 168 \| 0.9028 \|
	\| 0.8968 \| 0.1078 \| 172 \| 0.9003 \|
	\| 0.8908 \| 0.1103 \| 176 \| 0.8982 \|
	\| 0.9119 \| 0.1129 \| 180 \| 0.8979 \|
	\| 0.842 \| 0.1154 \| 184 \| 0.8942 \|
	\| 0.7497 \| 0.1179 \| 188 \| 0.8930 \|
	\| 0.9294 \| 0.1204 \| 192 \| 0.8922 \|
	\| 0.8184 \| 0.1229 \| 196 \| 0.8891 \|
	\| 0.941 \| 0.1254 \| 200 \| 0.8883 \|
	\| 0.8884 \| 0.1279 \| 204 \| 0.8851 \|
	\| 0.8975 \| 0.1304 \| 208 \| 0.8851 \|
	\| 0.9205 \| 0.1329 \| 212 \| 0.8847 \|
	\| 0.8663 \| 0.1354 \| 216 \| 0.8815 \|
	\| 0.8455 \| 0.1379 \| 220 \| 0.8812 \|
	\| 0.921 \| 0.1404 \| 224 \| 0.8794 \|
	\| 0.9493 \| 0.1429 \| 228 \| 0.8784 \|
	\| 0.8949 \| 0.1455 \| 232 \| 0.8792 \|
	\| 0.8886 \| 0.1480 \| 236 \| 0.8773 \|
	\| 0.8808 \| 0.1505 \| 240 \| 0.8760 \|
	\| 0.8768 \| 0.1530 \| 244 \| 0.8750 \|
	\| 0.9354 \| 0.1555 \| 248 \| 0.8727 \|
	\| 0.8512 \| 0.1580 \| 252 \| 0.8721 \|
	\| 0.8355 \| 0.1605 \| 256 \| 0.8717 \|
	\| 0.7923 \| 0.1630 \| 260 \| 0.8699 \|
	\| 0.9027 \| 0.1655 \| 264 \| 0.8691 \|
	\| 0.8264 \| 0.1680 \| 268 \| 0.8681 \|
	\| 0.9199 \| 0.1705 \| 272 \| 0.8683 \|
	\| 0.8792 \| 0.1730 \| 276 \| 0.8666 \|
	\| 0.9347 \| 0.1755 \| 280 \| 0.8664 \|
	\| 0.8988 \| 0.1781 \| 284 \| 0.8652 \|
	\| 0.889 \| 0.1806 \| 288 \| 0.8646 \|
	\| 0.917 \| 0.1831 \| 292 \| 0.8633 \|
	\| 0.9206 \| 0.1856 \| 296 \| 0.8628 \|
	\| 0.9127 \| 0.1881 \| 300 \| 0.8629 \|
	\| 0.6946 \| 0.1906 \| 304 \| 0.8618 \|
	\| 0.9499 \| 0.1931 \| 308 \| 0.8612 \|
	\| 0.8798 \| 0.1956 \| 312 \| 0.8610 \|
	\| 0.8857 \| 0.1981 \| 316 \| 0.8610 \|
	\| 0.9356 \| 0.2006 \| 320 \| 0.8604 \|
	\| 0.8134 \| 0.2031 \| 324 \| 0.8597 \|
	\| 0.9214 \| 0.2056 \| 328 \| 0.8592 \|
	\| 0.8907 \| 0.2082 \| 332 \| 0.8590 \|
	\| 0.8309 \| 0.2107 \| 336 \| 0.8588 \|
	\| 0.8386 \| 0.2132 \| 340 \| 0.8584 \|
	\| 0.8001 \| 0.2157 \| 344 \| 0.8583 \|
	\| 0.8452 \| 0.2182 \| 348 \| 0.8580 \|
	\| 0.7587 \| 0.2207 \| 352 \| 0.8578 \|
	\| 0.8155 \| 0.2232 \| 356 \| 0.8576 \|
	\| 0.7179 \| 0.2257 \| 360 \| 0.8575 \|
	\| 0.8231 \| 0.2282 \| 364 \| 0.8573 \|
	\| 0.8984 \| 0.2307 \| 368 \| 0.8572 \|
	\| 0.8501 \| 0.2332 \| 372 \| 0.8571 \|
	\| 0.8512 \| 0.2357 \| 376 \| 0.8570 \|
	\| 0.8554 \| 0.2382 \| 380 \| 0.8570 \|
	\| 0.9082 \| 0.2408 \| 384 \| 0.8570 \|
	\| 0.8617 \| 0.2433 \| 388 \| 0.8569 \|
	\| 0.8845 \| 0.2458 \| 392 \| 0.8569 \|
	\| 0.9595 \| 0.2483 \| 396 \| 0.8569 \|


	### Framework versions

	- Transformers 4.46.1
	- Pytorch 2.5.0+cu121
	- Datasets 3.1.0
	- Tokenizers 0.20.1