davda54 commited on
Commit
3bfa4da
·
verified ·
1 Parent(s): dac673f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -25,9 +25,9 @@ continuously pretrained on a total of 260 billion subword tokens -- using a mix
25
 
26
  ## Tokenizer
27
 
28
- This model uses a new tokenizer, specially trained on the target languages. Therefore it offers substantially faster inference than the original Mistral-Nemo-Base-2407 model. Here are the subword-to-word split ratios accross different languages:
29
 
30
- | Tokenizer | # tokens | Bokmål | Nynorsk | Sámi | Danish | Swedish | English |
31
- |------------|--------|--------|---------|-------|--------|---------|---------|
32
- | Mistral-Nemo-Base-2407 | 51200| 1.79 | 1.87 | 2.63 | 1.82 | 2.00 | 1.33 |
33
- | NorMistral-11b-warm | 131072 | 1.22 | 1.28 | 1.82 | 1.33 | 1.39 | 1.29 |
 
25
 
26
  ## Tokenizer
27
 
28
+ This model uses a new tokenizer, specially trained on the target languages. Therefore it offers substantially faster inference than the original Mistral-Nemo-Base-2407 model. Here are the subword-to-word split ratios across different languages:
29
 
30
+ | Tokenizer | # tokens | Bokmål | Nynorsk | Sámi | Danish | Swedish |
31
+ |------------|--------|--------|---------|-------|--------|---------|
32
+ | Mistral-Nemo-Base-2407 | 51200| 1.79 | 1.87 | 2.63 | 1.82 | 2.00 |
33
+ | NorMistral-11b-warm | 131072 | 1.22 | 1.28 | 1.82 | 1.33 | 1.39 |