zirui3's picture
Upload README.md
3315142
# summary
multilingual tokenizer trained on multilingual data by using the SentencePiece library and the BPE algorithm.
* vocab size: 100k