tbs17 commited on
Commit
6999b66
1 Parent(s): 518f69e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -191,4 +191,6 @@ The details of the masking procedure for each sentence are the following:
191
  + In the 10% remaining cases, the masked tokens are left as is.
192
 
193
  #### Pretraining
194
- The model was trained on a 8-core cloud TPUs from Google Colab for 600k steps with a batch size of 128. The sequence length was limited to 512 for the entire time. The optimizer used is Adam with a learning rate of 5e-5, beta_{1} = 0.9 and beta_{2} =0.999, a weight decay of 0.01, learning rate warmup for 10,000 steps and linear decay of the learning rate after.
 
 
 
191
  + In the 10% remaining cases, the masked tokens are left as is.
192
 
193
  #### Pretraining
194
+ The model was trained on a 8-core cloud TPUs from Google Colab for 600k steps with a batch size of 128. The sequence length was limited to 512 for the entire time. The optimizer used is Adam with a learning rate of 5e-5, beta_{1} = 0.9 and beta_{2} =0.999, a weight decay of 0.01, learning rate warmup for 10,000 steps and linear decay of the learning rate after.
195
+
196
+ You can refer to the training and fine-tuning code at https://github.com/tbs17/MathBERT.