Processor config change leads to errors

#37

by 7AtAri - opened Nov 23, 2024

Nov 23, 2024

Since the processor config was changed a few days ago, my code throws an error, I think I cannot fix:

File "/opt/conda/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 197, in forward
return self.model.forward(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward
output = module._old_forward(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/llava_next/modeling_llava_next.py", line 873, in forward
inputs_embeds, attention_mask, position_ids, labels, _ = self._merge_input_ids_with_image_features(
File "/opt/conda/lib/python3.10/site-packages/transformers/models/llava_next/modeling_llava_next.py", line 551, in _merge_input_ids_with_image_features
raise ValueError(
ValueError: Number of image tokens in input_ids (2040) different from num_images (8).

I use a batch_size of 8, but now the image tokens do not seem to match anymore.
A workaround would be using an older version of the processor, but I realized, neither the model nor the processor are versioned. I am in the middle of fine-tuning and was loading from a checkpoint when this occurred.
Since I did not change my code and I had also been loading from another checkpoint before, the issue should be
the changed config or some other change to the processor last week or the week before last week.

Could you add an older version of the processor, or give me a hint on how to fix this?

Thanks

TrgTuan10

Nov 24, 2024

Do you have an idea to fix that bro?

RaushanTurganbay

Llava Hugging Face org Nov 24, 2024

•

edited Nov 24, 2024

Please update transformers and follow https://github.com/google-research/scenic/tree/main/scenic/projects/vid2seq if you have any further issues. We have been updating the LLaVA models lately and yesterday moved to using the new non-legacy code

TrgTuan10

Nov 24, 2024

•

edited Nov 24, 2024

The reason is that the input_ids will automately padded by "image" token so it will raise "Number of image tokens in input_ids " is so large

7AtAri

Nov 24, 2024

After the transformers update the error persists. Problem must be the processor config. I used to load with AutoProcessor, but now the result is different I think.

Is there a reason why the LLaVA models are not versioned? It would really be helpful if I could just use the older version.

Also I looked into the link you provided and did not understand how this is related to my problem.
I am working with single images not with videos.

7AtAri

Nov 24, 2024

The reason is that the input_ids will automately padded by token so it will raise "Number of image tokens in input_ids " is so large

this could be the problem. how can I undo it?

TrgTuan10

Nov 24, 2024

i have a temporary solution that you fork this huggingface repo to your model and check the older commit to change the processor and tokenizer like it

TrgTuan10

Nov 25, 2024

@7AtAri TrgTuan10/llava-v1.6-mistral-7b-hf you can use it

7AtAri

Nov 25, 2024

thank you a lot! this version works!

nielsr

Llava Hugging Face org Nov 25, 2024

Hi,

No need to fork the repository, you can specify the commit hash in the from_pretrained method:

from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf", revision="2f7f20bda2e7af8e54438fec01ac5214e9ac6f92")

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment