tbs17
/

MathBERT

tbs17 commited on Jun 17, 2021

Commit

6999b66

•

1 Parent(s): 518f69e

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -191,4 +191,6 @@ The details of the masking procedure for each sentence are the following:
 + In the 10% remaining cases, the masked tokens are left as is.
 #### Pretraining
-The model was trained on a 8-core cloud TPUs from Google Colab for 600k steps with a batch size of 128. The sequence length was limited to 512 for the entire time. The optimizer used is Adam with a learning rate of 5e-5, beta_{1} = 0.9 and beta_{2} =0.999, a weight decay of 0.01, learning rate warmup for 10,000 steps and linear decay of the learning rate after.

 + In the 10% remaining cases, the masked tokens are left as is.
 #### Pretraining
+The model was trained on a 8-core cloud TPUs from Google Colab for 600k steps with a batch size of 128. The sequence length was limited to 512 for the entire time. The optimizer used is Adam with a learning rate of 5e-5, beta_{1} = 0.9 and beta_{2} =0.999, a weight decay of 0.01, learning rate warmup for 10,000 steps and linear decay of the learning rate after.
+You can refer to the training and fine-tuning code at https://github.com/tbs17/MathBERT.