tokenisers / README.md
pietrolesci's picture
Create README.md
a4b467c verified

Tokenisers trained on the MiniPile. The _raw_tokenisers folder contains the original tokenisers trained with a vocabulary size of 320k. Then, each folder is a transformers-compatible tokeniser of a smaller size.