Update README.md
Browse files
README.md
CHANGED
@@ -120,7 +120,7 @@ The BERT model was pretrained on pre-k to HS math curriculum (engageNY, Utah Mat
|
|
120 |
|
121 |
#### Training procedure
|
122 |
|
123 |
-
The texts are lowercased and tokenized using WordPiece and a vocabulary size of 30,522. The inputs of the model are then of the form:
|
124 |
|
125 |
```
|
126 |
[CLS] Sentence A [SEP] Sentence B [SEP]
|
|
|
120 |
|
121 |
#### Training procedure
|
122 |
|
123 |
+
The texts are lowercased and tokenized using WordPiece and a customized vocabulary size of 30,522. We use the ```bert_tokenizer``` from huggingface tokenizers library to generate a custom vocab file from our training raw math texts. The inputs of the model are then of the form:
|
124 |
|
125 |
```
|
126 |
[CLS] Sentence A [SEP] Sentence B [SEP]
|