Tokenizer is broken?
#1
by
mpasila
- opened
I tried loading it in 4bit with bitsandbytes and it gives this error
Traceback (most recent call last):
File "C:\Users\pasil\text-generation-webui\server.py", line 223, in <module>
shared.model, shared.tokenizer = load_model(model_name)
^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\pasil\text-generation-webui\modules\models.py", line 92, in load_model
tokenizer = load_tokenizer(model_name, model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\pasil\text-generation-webui\modules\models.py", line 111, in load_tokenizer
tokenizer = AutoTokenizer.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\pasil\anaconda3\envs\textgen\Lib\site-packages\transformers\models\auto\tokenization_auto.py", line 751, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\pasil\anaconda3\envs\textgen\Lib\site-packages\transformers\tokenization_utils_base.py", line 2017, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\pasil\anaconda3\envs\textgen\Lib\site-packages\transformers\tokenization_utils_base.py", line 2249, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\pasil\anaconda3\envs\textgen\Lib\site-packages\transformers\models\llama\tokenization_llama.py", line 141, in __init__
self.sp_model = self.get_spm_processor(kwargs.pop("from_slow", False))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\pasil\anaconda3\envs\textgen\Lib\site-packages\transformers\models\llama\tokenization_llama.py", line 166, in get_spm_processor
tokenizer.Load(self.vocab_file)
File "C:\Users\pasil\anaconda3\envs\textgen\Lib\site-packages\sentencepiece\__init__.py", line 905, in Load
return self.LoadFromFile(model_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\pasil\anaconda3\envs\textgen\Lib\site-packages\sentencepiece\__init__.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: not a string
@mpasila can you try load tokenizer with the
AutoTokenizer
fromtransformers
? That should work.
I tried loading it again today and now it's loading it just fine. So I'm not sure what happened last time. Oobabooga's text-generation-webui already does what you suggested so that shouldn't have been the problem.
mpasila
changed discussion status to
closed