Updated the report on the training process with the V1_small dataset

70cb507 over 1 year ago

4.01 kB

	---
	license: mit
	datasets:
	- nelson2424/Chess_openings_dataset
	language:
	- en
	metrics:
	- perplexity
	- accuracy
	pipeline_tag: text-generation
	---

	This model was created to predict moves in the chess opening.
	The idea is to test the impact of modeling the game text differently and report the results.
	You can access the code for training [here](https://github.com/bit2424/chess_openings_teacher/tree/main/ML/Training)
	You can access the different model configurations and results [here](https://wandb.ai/nelsonquinones2424/Chess%20Openings%20Tutor)

	# Training process:
	- ## Training with V1_small dataset:
	To understand the following discussion is important to check the structure of the [nelson2424/Chess_openings_dataset](https://huggingface.co/datasets/nelson2424/Chess_openings_dataset) dataset the V1_small version.

	During the training process, multiple challenges arose.
	-The first problem was the <b>low accuracy</b> in the results the model was getting, to mitigate that problem, I tried the following:
	- <b>learning rate:</b>
	The first approach to solve this problem was to modify the learning rate. <br>
	A step further involved changing the lr scheduler from a linear configuration to a polynomial decay configuration.
	These changes did not have a significant effect on the accuracy.
	- <b>Probability of masked tokens:</b>
	Decreasing the probability of the masked tokens in the dataset increased the accuracy but at the expense of the model having a weaker prediction capability.
	Having a low masked token probability will result in a model incapable of predicting correct moves on different openings.
	- <b>Focus on predicting the moves:</b>
	The current model tries to model the whole text that the V1_small version of the dataset provides, which includes
	<br>trying to predict parts of the board after a move or the name of the opening, as seen in the following example:
	~~~
	<s>King's Indian <mask>: <mask> Variation, Debrecen Defense
	r n b q k b n r
	p p p p p p p p
	........
	........
	.. P.....
	........
	P P. P <mask> P P P
	R N B Q K B N R
	m:g8f6
	<mask>:<mask>b<mask><mask> b q k b. r
	p p p p p p p p
	..... n..
	........
	.. P.....
	........
	P P. P P P P P
	R N B Q K B N R
	m:b1c3
	<mask><mask><mask>
	<mask><mask> b q k b. r
	p p p p p p p p
	..... n..
	........
	.. P.....
	.. N.....
	P P. P P P P P
	R. B Q K B N'
	~~~

	<br> After realizing that my model was not able to learn a complex enough function to correctly
	<br> model the problem at hand due to limited computational resources, I decided to narrow the scope of the problem.
	<br> Instead of trying to generate the whole context, the model would only learn to generate moves and the effect they have on the board based on a rich context.
	<br> This allowed the model to have a rich representation of the game and predict moves more accurately.
	<br> As a result, the data was modified only to mask move predictions and their corresponding effects on the board.
	<br> The data now looks as follows:

	~~~
	<s>King's Indian Defense: Fianchetto Variation, Debrecen Defense
	r n b q k b n r
	p p p p p p p p
	........
	........
	.. P.....
	........
	P P. P P P P P
	R N B Q K B N R
	<mask><mask><mask><mask><mask><mask>
	<mask><mask><mask><mask><mask><mask> b q k b. r
	p p p p p p p p
	..... n..
	........
	.. P.....
	........
	P P. P P P P P
	R N B Q K B N R
	m:b1c3
	<mask><mask><mask><mask><mask><mask> b q k b. r
	p p p p p p p p
	..... n..
	........
	.. P.....
	.. N.....
	P P. P P P P P
	R. B Q K B N'
	~~~