zirui3's picture
Upload README.md
3315142
|
raw
history blame
143 Bytes

summary

multilingual tokenizer trained on multilingual data by using the SentencePiece library and the BPE algorithm.

  • vocab size: 100k