nelson2424
/

distilroberta-base-finetuned-cot

@@ -6,9 +6,95 @@ language:
 - en
 metrics:
 - perplexity
 pipeline_tag: text-generation
 ---
 This model was created to predict moves in the chess opening.
 The idea is to test the impact of modeling the game text differently and report the results.
-You can access the different model configurations and results at https://wandb.ai/nelsonquinones2424/Chess%20Openings%20Tutor.

 - en
 metrics:
 - perplexity
+- accuracy
 pipeline_tag: text-generation
 ---
 This model was created to predict moves in the chess opening.
 The idea is to test the impact of modeling the game text differently and report the results.
+You can access the code for training [here](https://github.com/bit2424/chess_openings_teacher/tree/main/ML/Training)
+You can access the different model configurations and results [here](https://wandb.ai/nelsonquinones2424/Chess%20Openings%20Tutor)
+# Training process:
+  - ## Training with V1_small dataset:
+    To understand the following discussion is important to check the structure of the [nelson2424/Chess_openings_dataset](https://huggingface.co/datasets/nelson2424/Chess_openings_dataset) dataset the V1_small version.
+    During the training process, multiple challenges arose.
+    The first problem was the low accuracy results it was getting, to mitigate that problem I tried the following:
+    - <b>learning rate:</b>
+      The first approach to solve this problem was to modify the learning rate. <br>
+      A step further involved changing the lr scheduler from a linear configuration to a polynomial decay configuration.
+      These changes did not have a significant effect on the accuracy.
+    - <b>Probability of masked tokens:</b>
+      Decreasing the probability of the masked tokens in the dataset increased the accuracy but at the expense of the model having a weaker prediction capability.
+      Having a low masked token probability will result in a model incapable of predicting correct moves on different openings.
+    - <b>Focus on predicting the moves:</b>
+      The current model tries to model the whole text that the V1_small version of the dataset provides, which includes
+      <br>trying to predict parts of the board after a move or the name of the opening, as seen in the following example:
+      ~~~
+        <s>King's Indian <mask>: <mask> Variation, Debrecen Defense
+        r n b q k b n r
+        p p p p p p p p
+        ........
+        ........
+        .. P.....
+        ........
+        P P. P <mask> P P P
+        R N B Q K B N R
+        m:g8f6
+        <mask>:<mask>b<mask><mask> b q k b. r
+        p p p p p p p p
+        ..... n..
+        ........
+        .. P.....
+        ........
+        P P. P P P P P
+        R N B Q K B N R
+        m:b1c3
+        <mask><mask><mask>
+        <mask><mask> b q k b. r
+        p p p p p p p p
+        ..... n..
+        ........
+        .. P.....
+        .. N.....
+        P P. P P P P P
+        R. B Q K B N'
+      ~~~
+      <br> After realizing that my model was not able to learn a complex enough function to correctly
+      <br> model the problem at hand due to limited computational resources, I decided to narrow the scope of the problem.
+      <br> Instead of trying to generate the whole context, the model would only learn to generate moves and the effect they have on the board based on a rich context.
+      <br> This allowed the model to have a rich representation of the game and predict moves more accurately.
+      <br> As a result, the data was modified to only mask move predictions and their corresponding effects on the board.
+      <br> The data now looks as follows:
+      ~~~
+        <s>King's Indian Defense: Fianchetto Variation, Debrecen Defense
+         r n b q k b n r
+        p p p p p p p p
+        ........
+        ........
+        .. P.....
+        ........
+        P P. P P P P P
+        R N B Q K B N R
+        <mask><mask><mask><mask><mask><mask>
+        <mask><mask><mask><mask><mask><mask> b q k b. r
+        p p p p p p p p
+        ..... n..
+        ........
+        .. P.....
+        ........
+        P P. P P P P P
+        R N B Q K B N R
+        m:b1c3
+        <mask><mask><mask><mask><mask><mask> b q k b. r
+        p p p p p p p p
+        ..... n..
+        ........
+        .. P.....
+        .. N.....
+        P P. P P P P P
+        R. B Q K B N'
+      ~~~