mmukh commited on
Commit
453e405
·
1 Parent(s): 1cf139f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -7,7 +7,7 @@ SOBertBase is a 109M parameter BERT models trained on 27 billion tokens of SO da
7
  SOBert is pre-trained with 19 GB data presented as 15 million samples where each sample contains an entire post and all its corresponding comments. We also include
8
  all code in each answer so that our model is bimodal in nature. We use a SentencePiece tokenizer trained with BytePair Encoding, which has the benefit over WordPiece of never labeling tokens as “unknown".
9
  Additionally, SOBert is trained with a a maximum sequence length of 2048 based on the empirical length distribution of StackOverflow posts and a relatively
10
- large batch size of 0.5M tokens. A larger 762 million parameter model can also be found. More details can be found in the paper
11
  [Stack Over-Flowing with Results: The Case for Domain-Specific Pre-Training Over One-Size-Fits-All Models](https://arxiv.org/pdf/2306.03268).
12
 
13
  #### How to use
 
7
  SOBert is pre-trained with 19 GB data presented as 15 million samples where each sample contains an entire post and all its corresponding comments. We also include
8
  all code in each answer so that our model is bimodal in nature. We use a SentencePiece tokenizer trained with BytePair Encoding, which has the benefit over WordPiece of never labeling tokens as “unknown".
9
  Additionally, SOBert is trained with a a maximum sequence length of 2048 based on the empirical length distribution of StackOverflow posts and a relatively
10
+ large batch size of 0.5M tokens. A larger 762 million parameter model can also be found [here](https://huggingface.co/mmukh/SOBertLarge). More details can be found in the paper
11
  [Stack Over-Flowing with Results: The Case for Domain-Specific Pre-Training Over One-Size-Fits-All Models](https://arxiv.org/pdf/2306.03268).
12
 
13
  #### How to use