Update vocab size
#42
by
mathemakitten
- opened
README.md
CHANGED
@@ -191,7 +191,7 @@ The BLOOM tokenizer ([link](https://huggingface.co/bigscience/tokenizer)) is a l
|
|
191 |
|
192 |
- A simple pre-tokenization rule, no normalization
|
193 |
|
194 |
-
- A vocabulary size of 250,
|
195 |
|
196 |
It was trained on a subset of a preliminary version of the corpus using alpha-weighting per language.
|
197 |
|
|
|
191 |
|
192 |
- A simple pre-tokenization rule, no normalization
|
193 |
|
194 |
+
- A vocabulary size of 250,880
|
195 |
|
196 |
It was trained on a subset of a preliminary version of the corpus using alpha-weighting per language.
|
197 |
|