tokenisers / README.md
pietrolesci's picture
Create README.md
a4b467c verified
Tokenisers trained on the MiniPile. The `_raw_tokenisers` folder contains the original tokenisers trained with a vocabulary size of 320k. Then, each folder is a `transformers`-compatible tokeniser of a smaller size.