|
--- |
|
license: mit |
|
datasets: |
|
- nelson2424/Chess_openings_dataset |
|
language: |
|
- en |
|
metrics: |
|
- perplexity |
|
- accuracy |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
This model was created to predict moves in the chess opening. |
|
The idea is to test the impact of modeling the game text differently and report the results. |
|
You can access the code for training [here](https://github.com/bit2424/chess_openings_teacher/tree/main/ML/Training) |
|
You can access the different model configurations and results [here](https://wandb.ai/nelsonquinones2424/Chess%20Openings%20Tutor) |
|
|
|
# Training process: |
|
- ## Training with V1_small dataset: |
|
To understand the following discussion is important to check the structure of the [nelson2424/Chess_openings_dataset](https://huggingface.co/datasets/nelson2424/Chess_openings_dataset) dataset the V1_small version. |
|
|
|
During the training process, multiple challenges arose. |
|
-The first problem was the <b>low accuracy</b> in the results the model was getting, to mitigate that problem, I tried the following: |
|
- <b>learning rate:</b> |
|
The first approach to solve this problem was to modify the learning rate. <br> |
|
A step further involved changing the lr scheduler from a linear configuration to a polynomial decay configuration. |
|
These changes did not have a significant effect on the accuracy. |
|
- <b>Probability of masked tokens:</b> |
|
Decreasing the probability of the masked tokens in the dataset increased the accuracy but at the expense of the model having a weaker prediction capability. |
|
Having a low masked token probability will result in a model incapable of predicting correct moves on different openings. |
|
- <b>Focus on predicting the moves:</b> |
|
The current model tries to model the whole text that the V1_small version of the dataset provides, which includes |
|
<br>trying to predict parts of the board after a move or the name of the opening, as seen in the following example: |
|
~~~ |
|
<s>King's Indian <mask>: <mask> Variation, Debrecen Defense |
|
r n b q k b n r |
|
p p p p p p p p |
|
........ |
|
........ |
|
.. P..... |
|
........ |
|
P P. P <mask> P P P |
|
R N B Q K B N R |
|
m:g8f6 |
|
<mask>:<mask>b<mask><mask> b q k b. r |
|
p p p p p p p p |
|
..... n.. |
|
........ |
|
.. P..... |
|
........ |
|
P P. P P P P P |
|
R N B Q K B N R |
|
m:b1c3 |
|
<mask><mask><mask> |
|
<mask><mask> b q k b. r |
|
p p p p p p p p |
|
..... n.. |
|
........ |
|
.. P..... |
|
.. N..... |
|
P P. P P P P P |
|
R. B Q K B N' |
|
~~~ |
|
|
|
<br> After realizing that my model was not able to learn a complex enough function to correctly |
|
<br> model the problem at hand due to limited computational resources, I decided to narrow the scope of the problem. |
|
<br> Instead of trying to generate the whole context, the model would only learn to generate moves and the effect they have on the board based on a rich context. |
|
<br> This allowed the model to have a rich representation of the game and predict moves more accurately. |
|
<br> As a result, the data was modified only to mask move predictions and their corresponding effects on the board. |
|
<br> The data now looks as follows: |
|
|
|
~~~ |
|
<s>King's Indian Defense: Fianchetto Variation, Debrecen Defense |
|
r n b q k b n r |
|
p p p p p p p p |
|
........ |
|
........ |
|
.. P..... |
|
........ |
|
P P. P P P P P |
|
R N B Q K B N R |
|
<mask><mask><mask><mask><mask><mask> |
|
<mask><mask><mask><mask><mask><mask> b q k b. r |
|
p p p p p p p p |
|
..... n.. |
|
........ |
|
.. P..... |
|
........ |
|
P P. P P P P P |
|
R N B Q K B N R |
|
m:b1c3 |
|
<mask><mask><mask><mask><mask><mask> b q k b. r |
|
p p p p p p p p |
|
..... n.. |
|
........ |
|
.. P..... |
|
.. N..... |
|
P P. P P P P P |
|
R. B Q K B N' |
|
~~~ |