Update README.md
Browse files
README.md
CHANGED
@@ -7,7 +7,7 @@ SOBertBase is a 109M parameter BERT models trained on 27 billion tokens of SO da
|
|
7 |
SOBert is pre-trained with 19 GB data presented as 15 million samples where each sample contains an entire post and all its corresponding comments. We also include
|
8 |
all code in each answer so that our model is bimodal in nature. We use a SentencePiece tokenizer trained with BytePair Encoding, which has the benefit over WordPiece of never labeling tokens as “unknown".
|
9 |
Additionally, SOBert is trained with a a maximum sequence length of 2048 based on the empirical length distribution of StackOverflow posts and a relatively
|
10 |
-
large batch size of 0.5M tokens. A larger
|
11 |
[Stack Over-Flowing with Results: The Case for Domain-Specific Pre-Training Over One-Size-Fits-All Models](https://arxiv.org/pdf/2306.03268).
|
12 |
|
13 |
#### How to use
|
|
|
7 |
SOBert is pre-trained with 19 GB data presented as 15 million samples where each sample contains an entire post and all its corresponding comments. We also include
|
8 |
all code in each answer so that our model is bimodal in nature. We use a SentencePiece tokenizer trained with BytePair Encoding, which has the benefit over WordPiece of never labeling tokens as “unknown".
|
9 |
Additionally, SOBert is trained with a a maximum sequence length of 2048 based on the empirical length distribution of StackOverflow posts and a relatively
|
10 |
+
large batch size of 0.5M tokens. A larger 762 million parameter model can also be found [here](https://huggingface.co/mmukh/SOBertLarge). More details can be found in the paper
|
11 |
[Stack Over-Flowing with Results: The Case for Domain-Specific Pre-Training Over One-Size-Fits-All Models](https://arxiv.org/pdf/2306.03268).
|
12 |
|
13 |
#### How to use
|