jacobfulano commited on
Commit
086b093
·
1 Parent(s): 2fd269e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -5
README.md CHANGED
@@ -11,16 +11,19 @@ inference: false
11
 
12
  MosaicBERT-Base is a new BERT architecture and training recipe optimized for fast pretraining.
13
  MosaicBERT trains faster and achieves higher pretraining and finetuning accuracy when benchmarked against
14
- Hugging Face's [bert-base-uncased](https://huggingface.co/bert-base-uncased).
 
 
15
 
16
  __This model was trained with [ALiBi](https://arxiv.org/abs/2108.12409) on a sequence length of 1024 tokens.__
17
 
18
  ALiBi allows a model trained with a sequence length n to easily extrapolate to sequence lengths >2n during finetuning. For more details, see [Train Short, Test Long: Attention with Linear
19
  Biases Enables Input Length Extrapolation (Press et al. 2022)](https://arxiv.org/abs/2108.12409)
20
 
21
- It is part of the family of MosaicBERT-Base models:
22
 
23
  * [mosaic-bert-base](https://huggingface.co/mosaicml/mosaic-bert-base) (trained on a sequence length of 128 tokens)
 
24
  * [mosaic-bert-base-seqlen-512](https://huggingface.co/mosaicml/mosaic-bert-base-seqlen-512)
25
  * mosaic-bert-base-seqlen-1024
26
  * [mosaic-bert-base-seqlen-2048](https://huggingface.co/mosaicml/mosaic-bert-base-seqlen-2048)
@@ -40,7 +43,7 @@ April 2023
40
 
41
  ```python
42
  from transformers import AutoModelForMaskedLM
43
- mlm = AutoModelForMaskedLM.from_pretrained('mosaicml/mosaic-bert-base', trust_remote_code=True)
44
  ```
45
 
46
  The tokenizer for this model is simply the Hugging Face `bert-base-uncased` tokenizer.
@@ -56,7 +59,7 @@ To use this model directly for masked language modeling, use `pipeline`:
56
  from transformers import AutoModelForMaskedLM, BertTokenizer, pipeline
57
 
58
  tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
59
- mlm = AutoModelForMaskedLM.from_pretrained('mosaicml/mosaic-bert-base', trust_remote_code=True)
60
 
61
  classifier = pipeline('fill-mask', model=mlm, tokenizer=tokenizer)
62
 
@@ -73,7 +76,7 @@ This model requires that `trust_remote_code=True` be passed to the `from_pretrai
73
 
74
  ```python
75
  mlm = AutoModelForMaskedLM.from_pretrained(
76
- 'mosaicml/mosaic-bert-base',
77
  trust_remote_code=True,
78
  revision='24512df',
79
  )
 
11
 
12
  MosaicBERT-Base is a new BERT architecture and training recipe optimized for fast pretraining.
13
  MosaicBERT trains faster and achieves higher pretraining and finetuning accuracy when benchmarked against
14
+ Hugging Face's [bert-base-uncased](https://huggingface.co/bert-base-uncased). It incorporates efficiency insights
15
+ from the past half a decade of transformers research, from RoBERTa to T5 and GPT.
16
+
17
 
18
  __This model was trained with [ALiBi](https://arxiv.org/abs/2108.12409) on a sequence length of 1024 tokens.__
19
 
20
  ALiBi allows a model trained with a sequence length n to easily extrapolate to sequence lengths >2n during finetuning. For more details, see [Train Short, Test Long: Attention with Linear
21
  Biases Enables Input Length Extrapolation (Press et al. 2022)](https://arxiv.org/abs/2108.12409)
22
 
23
+ It is part of the **family of MosaicBERT-Base models** trained using ALiBi on different sequence lengths:
24
 
25
  * [mosaic-bert-base](https://huggingface.co/mosaicml/mosaic-bert-base) (trained on a sequence length of 128 tokens)
26
+ * [mosaic-bert-base-seqlen-256](https://huggingface.co/mosaicml/mosaic-bert-base-seqlen-256)
27
  * [mosaic-bert-base-seqlen-512](https://huggingface.co/mosaicml/mosaic-bert-base-seqlen-512)
28
  * mosaic-bert-base-seqlen-1024
29
  * [mosaic-bert-base-seqlen-2048](https://huggingface.co/mosaicml/mosaic-bert-base-seqlen-2048)
 
43
 
44
  ```python
45
  from transformers import AutoModelForMaskedLM
46
+ mlm = AutoModelForMaskedLM.from_pretrained('mosaicml/mosaic-bert-base-seqlen-1024', trust_remote_code=True)
47
  ```
48
 
49
  The tokenizer for this model is simply the Hugging Face `bert-base-uncased` tokenizer.
 
59
  from transformers import AutoModelForMaskedLM, BertTokenizer, pipeline
60
 
61
  tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
62
+ mlm = AutoModelForMaskedLM.from_pretrained('mosaicml/mosaic-bert-base-seqlen-1024', trust_remote_code=True)
63
 
64
  classifier = pipeline('fill-mask', model=mlm, tokenizer=tokenizer)
65
 
 
76
 
77
  ```python
78
  mlm = AutoModelForMaskedLM.from_pretrained(
79
+ 'mosaicml/mosaic-bert-base-seqlen-1024',
80
  trust_remote_code=True,
81
  revision='24512df',
82
  )