Commit
·
70cb507
1
Parent(s):
902c1c7
Updated the report on the training process with the V1_small dataset
Browse files
README.md
CHANGED
@@ -20,7 +20,7 @@ You can access the different model configurations and results [here](https://wan
|
|
20 |
To understand the following discussion is important to check the structure of the [nelson2424/Chess_openings_dataset](https://huggingface.co/datasets/nelson2424/Chess_openings_dataset) dataset the V1_small version.
|
21 |
|
22 |
During the training process, multiple challenges arose.
|
23 |
-
The first problem was the low accuracy results
|
24 |
- <b>learning rate:</b>
|
25 |
The first approach to solve this problem was to modify the learning rate. <br>
|
26 |
A step further involved changing the lr scheduler from a linear configuration to a polynomial decay configuration.
|
@@ -66,7 +66,7 @@ You can access the different model configurations and results [here](https://wan
|
|
66 |
<br> model the problem at hand due to limited computational resources, I decided to narrow the scope of the problem.
|
67 |
<br> Instead of trying to generate the whole context, the model would only learn to generate moves and the effect they have on the board based on a rich context.
|
68 |
<br> This allowed the model to have a rich representation of the game and predict moves more accurately.
|
69 |
-
<br> As a result, the data was modified to
|
70 |
<br> The data now looks as follows:
|
71 |
|
72 |
~~~
|
|
|
20 |
To understand the following discussion is important to check the structure of the [nelson2424/Chess_openings_dataset](https://huggingface.co/datasets/nelson2424/Chess_openings_dataset) dataset the V1_small version.
|
21 |
|
22 |
During the training process, multiple challenges arose.
|
23 |
+
-The first problem was the <b>low accuracy</b> in the results the model was getting, to mitigate that problem, I tried the following:
|
24 |
- <b>learning rate:</b>
|
25 |
The first approach to solve this problem was to modify the learning rate. <br>
|
26 |
A step further involved changing the lr scheduler from a linear configuration to a polynomial decay configuration.
|
|
|
66 |
<br> model the problem at hand due to limited computational resources, I decided to narrow the scope of the problem.
|
67 |
<br> Instead of trying to generate the whole context, the model would only learn to generate moves and the effect they have on the board based on a rich context.
|
68 |
<br> This allowed the model to have a rich representation of the game and predict moves more accurately.
|
69 |
+
<br> As a result, the data was modified only to mask move predictions and their corresponding effects on the board.
|
70 |
<br> The data now looks as follows:
|
71 |
|
72 |
~~~
|