nelson2424
/

distilroberta-base-finetuned-cot

@@ -20,7 +20,7 @@ You can access the different model configurations and results [here](https://wan
     To understand the following discussion is important to check the structure of the [nelson2424/Chess_openings_dataset](https://huggingface.co/datasets/nelson2424/Chess_openings_dataset) dataset the V1_small version.
     During the training process, multiple challenges arose.
-    The first problem was the low accuracy results it was getting, to mitigate that problem I tried the following:
     - <b>learning rate:</b>
       The first approach to solve this problem was to modify the learning rate. <br>
       A step further involved changing the lr scheduler from a linear configuration to a polynomial decay configuration.
@@ -66,7 +66,7 @@ You can access the different model configurations and results [here](https://wan
       <br> model the problem at hand due to limited computational resources, I decided to narrow the scope of the problem.
       <br> Instead of trying to generate the whole context, the model would only learn to generate moves and the effect they have on the board based on a rich context.
       <br> This allowed the model to have a rich representation of the game and predict moves more accurately.
-      <br> As a result, the data was modified to only mask move predictions and their corresponding effects on the board.
       <br> The data now looks as follows:
       ~~~

     To understand the following discussion is important to check the structure of the [nelson2424/Chess_openings_dataset](https://huggingface.co/datasets/nelson2424/Chess_openings_dataset) dataset the V1_small version.
     During the training process, multiple challenges arose.
+    -The first problem was the <b>low accuracy</b> in the results the model was getting, to mitigate that problem, I tried the following:
     - <b>learning rate:</b>
       The first approach to solve this problem was to modify the learning rate. <br>
       A step further involved changing the lr scheduler from a linear configuration to a polynomial decay configuration.
       <br> model the problem at hand due to limited computational resources, I decided to narrow the scope of the problem.
       <br> Instead of trying to generate the whole context, the model would only learn to generate moves and the effect they have on the board based on a rich context.
       <br> This allowed the model to have a rich representation of the game and predict moves more accurately.
+      <br> As a result, the data was modified only to mask move predictions and their corresponding effects on the board.
       <br> The data now looks as follows:
       ~~~