Processor config change leads to errors

#37
by 7AtAri - opened

Since the processor config was changed a few days ago, my code throws an error, I think I cannot fix:

File "/opt/conda/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 197, in forward
return self.model.forward(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward
output = module._old_forward(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/llava_next/modeling_llava_next.py", line 873, in forward
inputs_embeds, attention_mask, position_ids, labels, _ = self._merge_input_ids_with_image_features(
File "/opt/conda/lib/python3.10/site-packages/transformers/models/llava_next/modeling_llava_next.py", line 551, in _merge_input_ids_with_image_features
raise ValueError(
ValueError: Number of image tokens in input_ids (2040) different from num_images (8).

I use a batch_size of 8, but now the image tokens do not seem to match anymore.
A workaround would be using an older version of the processor, but I realized, neither the model nor the processor are versioned. I am in the middle of fine-tuning and was loading from a checkpoint when this occurred.
Since I did not change my code and I had also been loading from another checkpoint before, the issue should be
the changed config or some other change to the processor last week or the week before last week.

Could you add an older version of the processor, or give me a hint on how to fix this?

Thanks

Do you have an idea to fix that bro?

Llava Hugging Face org
edited Nov 24

Please update transformers and follow https://github.com/google-research/scenic/tree/main/scenic/projects/vid2seq if you have any further issues. We have been updating the LLaVA models lately and yesterday moved to using the new non-legacy code

The reason is that the input_ids will automately padded by "image" token so it will raise "Number of image tokens in input_ids " is so large

After the transformers update the error persists. Problem must be the processor config. I used to load with AutoProcessor, but now the result is different I think.

Is there a reason why the LLaVA models are not versioned? It would really be helpful if I could just use the older version.

Also I looked into the link you provided and did not understand how this is related to my problem.
I am working with single images not with videos.

The reason is that the input_ids will automately padded by token so it will raise "Number of image tokens in input_ids " is so large

this could be the problem. how can I undo it?

i have a temporary solution that you fork this huggingface repo to your model and check the older commit to change the processor and tokenizer like it

@7AtAri TrgTuan10/llava-v1.6-mistral-7b-hf you can use it

thank you a lot! this version works!

Llava Hugging Face org

Hi,

No need to fork the repository, you can specify the commit hash in the from_pretrained method:

from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf", revision="2f7f20bda2e7af8e54438fec01ac5214e9ac6f92")

Sign up or log in to comment