nelson2424 commited on
Commit
902c1c7
·
1 Parent(s): 9cb38aa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +87 -1
README.md CHANGED
@@ -6,9 +6,95 @@ language:
6
  - en
7
  metrics:
8
  - perplexity
 
9
  pipeline_tag: text-generation
10
  ---
11
 
12
  This model was created to predict moves in the chess opening.
13
  The idea is to test the impact of modeling the game text differently and report the results.
14
- You can access the different model configurations and results at https://wandb.ai/nelsonquinones2424/Chess%20Openings%20Tutor.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - en
7
  metrics:
8
  - perplexity
9
+ - accuracy
10
  pipeline_tag: text-generation
11
  ---
12
 
13
  This model was created to predict moves in the chess opening.
14
  The idea is to test the impact of modeling the game text differently and report the results.
15
+ You can access the code for training [here](https://github.com/bit2424/chess_openings_teacher/tree/main/ML/Training)
16
+ You can access the different model configurations and results [here](https://wandb.ai/nelsonquinones2424/Chess%20Openings%20Tutor)
17
+
18
+ # Training process:
19
+ - ## Training with V1_small dataset:
20
+ To understand the following discussion is important to check the structure of the [nelson2424/Chess_openings_dataset](https://huggingface.co/datasets/nelson2424/Chess_openings_dataset) dataset the V1_small version.
21
+
22
+ During the training process, multiple challenges arose.
23
+ The first problem was the low accuracy results it was getting, to mitigate that problem I tried the following:
24
+ - <b>learning rate:</b>
25
+ The first approach to solve this problem was to modify the learning rate. <br>
26
+ A step further involved changing the lr scheduler from a linear configuration to a polynomial decay configuration.
27
+ These changes did not have a significant effect on the accuracy.
28
+ - <b>Probability of masked tokens:</b>
29
+ Decreasing the probability of the masked tokens in the dataset increased the accuracy but at the expense of the model having a weaker prediction capability.
30
+ Having a low masked token probability will result in a model incapable of predicting correct moves on different openings.
31
+ - <b>Focus on predicting the moves:</b>
32
+ The current model tries to model the whole text that the V1_small version of the dataset provides, which includes
33
+ <br>trying to predict parts of the board after a move or the name of the opening, as seen in the following example:
34
+ ~~~
35
+ <s>King's Indian <mask>: <mask> Variation, Debrecen Defense
36
+ r n b q k b n r
37
+ p p p p p p p p
38
+ ........
39
+ ........
40
+ .. P.....
41
+ ........
42
+ P P. P <mask> P P P
43
+ R N B Q K B N R
44
+ m:g8f6
45
+ <mask>:<mask>b<mask><mask> b q k b. r
46
+ p p p p p p p p
47
+ ..... n..
48
+ ........
49
+ .. P.....
50
+ ........
51
+ P P. P P P P P
52
+ R N B Q K B N R
53
+ m:b1c3
54
+ <mask><mask><mask>
55
+ <mask><mask> b q k b. r
56
+ p p p p p p p p
57
+ ..... n..
58
+ ........
59
+ .. P.....
60
+ .. N.....
61
+ P P. P P P P P
62
+ R. B Q K B N'
63
+ ~~~
64
+
65
+ <br> After realizing that my model was not able to learn a complex enough function to correctly
66
+ <br> model the problem at hand due to limited computational resources, I decided to narrow the scope of the problem.
67
+ <br> Instead of trying to generate the whole context, the model would only learn to generate moves and the effect they have on the board based on a rich context.
68
+ <br> This allowed the model to have a rich representation of the game and predict moves more accurately.
69
+ <br> As a result, the data was modified to only mask move predictions and their corresponding effects on the board.
70
+ <br> The data now looks as follows:
71
+
72
+ ~~~
73
+ <s>King's Indian Defense: Fianchetto Variation, Debrecen Defense
74
+ r n b q k b n r
75
+ p p p p p p p p
76
+ ........
77
+ ........
78
+ .. P.....
79
+ ........
80
+ P P. P P P P P
81
+ R N B Q K B N R
82
+ <mask><mask><mask><mask><mask><mask>
83
+ <mask><mask><mask><mask><mask><mask> b q k b. r
84
+ p p p p p p p p
85
+ ..... n..
86
+ ........
87
+ .. P.....
88
+ ........
89
+ P P. P P P P P
90
+ R N B Q K B N R
91
+ m:b1c3
92
+ <mask><mask><mask><mask><mask><mask> b q k b. r
93
+ p p p p p p p p
94
+ ..... n..
95
+ ........
96
+ .. P.....
97
+ .. N.....
98
+ P P. P P P P P
99
+ R. B Q K B N'
100
+ ~~~