Post
2048
Updated https://huggingface.co/blog/nroggendorff/train-with-llama-architecture so you can "train" your own tokenizer from your dataset.
Join the community of Machine Learners and AI enthusiasts.
Sign Upvery good !
maybe a colab !
could this be used to extend a tokenizer model with training ?
as i would like to update my mistral tokenizer to include forign chars, such as hebrew and amaric, and hindi
Im pretty sure you can add additional tokens and special tokens, so I suppose