Lucas Beyer
AI & ML interests
Recent Activity
Organizations
giffmana's activity
Is SiglipImageProcessor configured correctly?
Question About SigLIP 2’s Performance with Newline-Separated Labels

Not sure what's up as I'm not familiar with this codebase (and no time to dig in), but for siglip what you're supposed to do is do sigmoid(zimg @ ztxt * temperature + bias)
from what you describe, I would bet the bias and/or temperature are missing?
The ground-truth reference code is https://colab.research.google.com/github/google-research/big_vision/blob/main/big_vision/configs/proj/image_text/SigLIP2_demo.ipynb

Sorry i can't collaborate with individual's papers.

Ah sorry that wasn't clear from your message. I'm not familiar enough with this codebase to help more.

The warning gives you the answer: pass max_length=64

Yes. If you want longer text, what I'd do is chunk it into pieces of 64 tokens (possibly even overlapping), embed those separately, and either average their endings or dot them with the image embedding individually and take max or average score, depending on your use case.
I'm actually curious what kind of queries you're dealing with that are longer than 64 tokens? All use cases of siglip i can think of almost always fit in way below 64.