Commit
·
902c1c7
1
Parent(s):
9cb38aa
Update README.md
Browse files
README.md
CHANGED
@@ -6,9 +6,95 @@ language:
|
|
6 |
- en
|
7 |
metrics:
|
8 |
- perplexity
|
|
|
9 |
pipeline_tag: text-generation
|
10 |
---
|
11 |
|
12 |
This model was created to predict moves in the chess opening.
|
13 |
The idea is to test the impact of modeling the game text differently and report the results.
|
14 |
-
You can access the
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
- en
|
7 |
metrics:
|
8 |
- perplexity
|
9 |
+
- accuracy
|
10 |
pipeline_tag: text-generation
|
11 |
---
|
12 |
|
13 |
This model was created to predict moves in the chess opening.
|
14 |
The idea is to test the impact of modeling the game text differently and report the results.
|
15 |
+
You can access the code for training [here](https://github.com/bit2424/chess_openings_teacher/tree/main/ML/Training)
|
16 |
+
You can access the different model configurations and results [here](https://wandb.ai/nelsonquinones2424/Chess%20Openings%20Tutor)
|
17 |
+
|
18 |
+
# Training process:
|
19 |
+
- ## Training with V1_small dataset:
|
20 |
+
To understand the following discussion is important to check the structure of the [nelson2424/Chess_openings_dataset](https://huggingface.co/datasets/nelson2424/Chess_openings_dataset) dataset the V1_small version.
|
21 |
+
|
22 |
+
During the training process, multiple challenges arose.
|
23 |
+
The first problem was the low accuracy results it was getting, to mitigate that problem I tried the following:
|
24 |
+
- <b>learning rate:</b>
|
25 |
+
The first approach to solve this problem was to modify the learning rate. <br>
|
26 |
+
A step further involved changing the lr scheduler from a linear configuration to a polynomial decay configuration.
|
27 |
+
These changes did not have a significant effect on the accuracy.
|
28 |
+
- <b>Probability of masked tokens:</b>
|
29 |
+
Decreasing the probability of the masked tokens in the dataset increased the accuracy but at the expense of the model having a weaker prediction capability.
|
30 |
+
Having a low masked token probability will result in a model incapable of predicting correct moves on different openings.
|
31 |
+
- <b>Focus on predicting the moves:</b>
|
32 |
+
The current model tries to model the whole text that the V1_small version of the dataset provides, which includes
|
33 |
+
<br>trying to predict parts of the board after a move or the name of the opening, as seen in the following example:
|
34 |
+
~~~
|
35 |
+
<s>King's Indian <mask>: <mask> Variation, Debrecen Defense
|
36 |
+
r n b q k b n r
|
37 |
+
p p p p p p p p
|
38 |
+
........
|
39 |
+
........
|
40 |
+
.. P.....
|
41 |
+
........
|
42 |
+
P P. P <mask> P P P
|
43 |
+
R N B Q K B N R
|
44 |
+
m:g8f6
|
45 |
+
<mask>:<mask>b<mask><mask> b q k b. r
|
46 |
+
p p p p p p p p
|
47 |
+
..... n..
|
48 |
+
........
|
49 |
+
.. P.....
|
50 |
+
........
|
51 |
+
P P. P P P P P
|
52 |
+
R N B Q K B N R
|
53 |
+
m:b1c3
|
54 |
+
<mask><mask><mask>
|
55 |
+
<mask><mask> b q k b. r
|
56 |
+
p p p p p p p p
|
57 |
+
..... n..
|
58 |
+
........
|
59 |
+
.. P.....
|
60 |
+
.. N.....
|
61 |
+
P P. P P P P P
|
62 |
+
R. B Q K B N'
|
63 |
+
~~~
|
64 |
+
|
65 |
+
<br> After realizing that my model was not able to learn a complex enough function to correctly
|
66 |
+
<br> model the problem at hand due to limited computational resources, I decided to narrow the scope of the problem.
|
67 |
+
<br> Instead of trying to generate the whole context, the model would only learn to generate moves and the effect they have on the board based on a rich context.
|
68 |
+
<br> This allowed the model to have a rich representation of the game and predict moves more accurately.
|
69 |
+
<br> As a result, the data was modified to only mask move predictions and their corresponding effects on the board.
|
70 |
+
<br> The data now looks as follows:
|
71 |
+
|
72 |
+
~~~
|
73 |
+
<s>King's Indian Defense: Fianchetto Variation, Debrecen Defense
|
74 |
+
r n b q k b n r
|
75 |
+
p p p p p p p p
|
76 |
+
........
|
77 |
+
........
|
78 |
+
.. P.....
|
79 |
+
........
|
80 |
+
P P. P P P P P
|
81 |
+
R N B Q K B N R
|
82 |
+
<mask><mask><mask><mask><mask><mask>
|
83 |
+
<mask><mask><mask><mask><mask><mask> b q k b. r
|
84 |
+
p p p p p p p p
|
85 |
+
..... n..
|
86 |
+
........
|
87 |
+
.. P.....
|
88 |
+
........
|
89 |
+
P P. P P P P P
|
90 |
+
R N B Q K B N R
|
91 |
+
m:b1c3
|
92 |
+
<mask><mask><mask><mask><mask><mask> b q k b. r
|
93 |
+
p p p p p p p p
|
94 |
+
..... n..
|
95 |
+
........
|
96 |
+
.. P.....
|
97 |
+
.. N.....
|
98 |
+
P P. P P P P P
|
99 |
+
R. B Q K B N'
|
100 |
+
~~~
|