do you have 224x224 or 386x386 pretrained clip model?

#42
by bojohn - opened

Thanks for your great work. It is useful.But I found it is a little slow when inference image encoder, so do you have 224x224 or 386x386 pretrained clip model?

+++, I can't infer images in a reasonable time because of their size. It can't be fitted into RAM, so I can't use all the available GPUs.

Jina AI org

Thanks for your feedback! We do not offer a lower resolution version for this model, however if you dont need multilinguality, you can check https://huggingface.co/jinaai/jina-clip-v1, a smaller model with smaller input resolution with similar performance on english text and cross-modal tasks.

That said, you can improve the inference speed by using bf16, xformers and flash-attention. You can also try a higher patch dropout to drop more image patches before processing. If model is still slow, I suggest you try out the ONNX model and the quantized versions

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment