tbs17 commited on
Commit
0288bf3
1 Parent(s): 4ef51e6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -120,7 +120,7 @@ The BERT model was pretrained on pre-k to HS math curriculum (engageNY, Utah Mat
120
 
121
  #### Training procedure
122
 
123
- The texts are lowercased and tokenized using WordPiece and a vocabulary size of 30,522. The inputs of the model are then of the form:
124
 
125
  ```
126
  [CLS] Sentence A [SEP] Sentence B [SEP]
 
120
 
121
  #### Training procedure
122
 
123
+ The texts are lowercased and tokenized using WordPiece and a customized vocabulary size of 30,522. We use the ```bert_tokenizer``` from huggingface tokenizers library to generate a custom vocab file from our training raw math texts. The inputs of the model are then of the form:
124
 
125
  ```
126
  [CLS] Sentence A [SEP] Sentence B [SEP]