problem loading tokenizer
I'm seeing the following issue:
>>> from transformers import AutoTokenizer
>>> model_id = 'echo840/Monkey'
>>> tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
tokenizer_config.json: 100%|βββββββββββββββββββββββββββββββββββββββ| 288/288 [00:00<00:00, 2.87MB/s]
tokenization_qwen.py: 100%|ββββββββββββββββββββββββββββββββββββ| 21.3k/21.3k [00:00<00:00, 84.0MB/s]
A new version of the following files was downloaded from https://huggingface.co/echo840/Monkey:
- tokenization_qwen.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
qwen.tiktoken: 100%|βββββββββββββββββββββββββββββββββββββββββββ| 2.56M/2.56M [00:00<00:00, 30.2MB/s]
special_tokens_map.json: 100%|ββββββββββββββββββββββββββββββββββββ| 35.0/35.0 [00:00<00:00, 333kB/s]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/mashton/.local/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 829, in from_pretrained
return tokenizer_class.from_pretrained(
File "/home/mashton/.local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2089, in from_pretrained
return cls._from_pretrained(
File "/home/mashton/.local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2311, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "hf_home/modules/transformers_modules/echo840/Monkey/e12c9762d453211a1f3d8f5545b3bbfd70d4d1b7/tokenization_qwen.py", line 114, in __init__
super().__init__(**kwargs)
File "/home/mashton/.local/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 367, in __init__
self._add_tokens(
File "hf_home/modules/transformers_modules/echo840/Monkey/e12c9762d453211a1f3d8f5545b3bbfd70d4d1b7/tokenization_qwen.py", line 217, in _add_tokens
if surface_form not in SPECIAL_TOKENS + self.IMAGE_ST:
AttributeError: 'QWenTokenizer' object has no attribute 'IMAGE_ST'
It seems the super().init(**kwargs) is calling _add_tokens() before the self.IMAGE_ST is being added.
I tried this with transformers-4.39.2 and 4.40.0.dev0, same result.
Hello, you should either use transformers==4.32.0 or refer to this link for fixing: https://huggingface.co/echo840/Monkey-Chat/discussions/1.
Based on the copyright, I don't think I can share my fix if I do fix it. I'm trying to include support for Monkey into another project but I am not sure if I can re-distribute a modified (fixed) version of Monkey. Will you not fix it? The workaround seems simple enough and should cause no harm to any existing users.
transformers continues to update, I cannot use transformers==4.32.0 for my project and if I can't share a fix... What can others do?
Thank you! I have resolved the issue. Please give it another try, and if you have any questions, please inform me.
It's working now, thank you! I've support for it to my project: https://github.com/matatonic/openedai-vision
Congratulations, it works very well and thanks again!
That sounds great! Thank you for your contribution.