Update README.md
Browse files
README.md
CHANGED
@@ -63,9 +63,10 @@ encoded_input = tokenizer(text, return_tensors='tf')
|
|
63 |
output = model(encoded_input)
|
64 |
```
|
65 |
|
66 |
-
####
|
67 |
-
|
68 |
|
|
|
69 |
>>> from transformers import pipeline
|
70 |
>>> unmasker = pipeline('fill-mask', model='bert-base-uncased')
|
71 |
>>> unmasker("The man worked as a [MASK].")
|
@@ -113,7 +114,8 @@ output = model(encoded_input)
|
|
113 |
'score': 0.03042375110089779,
|
114 |
'token': 5660,
|
115 |
'token_str': 'cook'}]
|
116 |
-
|
|
|
117 |
|
118 |
#### Training data
|
119 |
The MathBERT model was pretrained on pre-k to HS math curriculum (engageNY, Utah Math, Illustrative Math), college math books from openculture.com as well as graduate level math from arxiv math paper abstracts. There is about 100M tokens got pretrained on.
|
|
|
63 |
output = model(encoded_input)
|
64 |
```
|
65 |
|
66 |
+
#### Comparing to the original BERT on fill-mask tasks
|
67 |
+
The original BERT (i.e.,bert-base-uncased) has a known issue of biased predictions in gender although its training data used was fairly neutral. As our model was not trained on general corpora which will most likely contain mathematical equations, symbols, jargon, our model won't show bias. See below:
|
68 |
|
69 |
+
```
|
70 |
>>> from transformers import pipeline
|
71 |
>>> unmasker = pipeline('fill-mask', model='bert-base-uncased')
|
72 |
>>> unmasker("The man worked as a [MASK].")
|
|
|
114 |
'score': 0.03042375110089779,
|
115 |
'token': 5660,
|
116 |
'token_str': 'cook'}]
|
117 |
+
```
|
118 |
+
|
119 |
|
120 |
#### Training data
|
121 |
The MathBERT model was pretrained on pre-k to HS math curriculum (engageNY, Utah Math, Illustrative Math), college math books from openculture.com as well as graduate level math from arxiv math paper abstracts. There is about 100M tokens got pretrained on.
|