Update README.md
Browse files
README.md
CHANGED
@@ -191,4 +191,6 @@ The details of the masking procedure for each sentence are the following:
|
|
191 |
+ In the 10% remaining cases, the masked tokens are left as is.
|
192 |
|
193 |
#### Pretraining
|
194 |
-
The model was trained on a 8-core cloud TPUs from Google Colab for 600k steps with a batch size of 128. The sequence length was limited to 512 for the entire time. The optimizer used is Adam with a learning rate of 5e-5, beta_{1} = 0.9 and beta_{2} =0.999, a weight decay of 0.01, learning rate warmup for 10,000 steps and linear decay of the learning rate after.
|
|
|
|
|
|
191 |
+ In the 10% remaining cases, the masked tokens are left as is.
|
192 |
|
193 |
#### Pretraining
|
194 |
+
The model was trained on a 8-core cloud TPUs from Google Colab for 600k steps with a batch size of 128. The sequence length was limited to 512 for the entire time. The optimizer used is Adam with a learning rate of 5e-5, beta_{1} = 0.9 and beta_{2} =0.999, a weight decay of 0.01, learning rate warmup for 10,000 steps and linear decay of the learning rate after.
|
195 |
+
|
196 |
+
You can refer to the training and fine-tuning code at https://github.com/tbs17/MathBERT.
|