English
pszemraj commited on
Commit
44306b6
1 Parent(s): ba51bc8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -27,9 +27,8 @@ print(f"Tokens:\n\t{output.input_ids}")
27
 
28
  ## Notes
29
 
30
- 1. the default tokenizer (on branch `main`) has a vocab size of 32100.
31
- - use a model vocab size of 32128 because GPUs like this better
32
-
33
 
34
  <details>
35
  <summary>How to Tokenize Text and Retrieve Offsets</summary>
 
27
 
28
  ## Notes
29
 
30
+ 1. the default tokenizer (on branch `main`) has a vocab size of 32000
31
+ 2. based on the `SentencePieceBPETokenizer` class
 
32
 
33
  <details>
34
  <summary>How to Tokenize Text and Retrieve Offsets</summary>