Question about the new tokenizer.
#9
by
hyunseoki
- opened
Thank you for your great work!!
I'm curious that how did you produce the new tokenizer which added new Korean vocabulary.
I wonder new Korean vocabulary may be exist duplicates with original llama tokenizer's.
New Korean vocab does not duplicated with original Llama tokenizer since I used add_new_vocab
method in Tokenizers, which explicitly prevents adding pre-existing vocab.
beomi
changed discussion status to
closed