Is there any inference server which can support Phi-3-vision-128K-instruct?

#49
by farzanehnakhaee70 - opened

Is there any inference server like Ollama or TGI which can support this model?

Maybe sglang can serve it. It supprots llava-next, so I think a little bit of modification can serve phi3-vision.
Oh, also vllm now supportes phi3-vision too. You can see the issue here. https://github.com/vllm-project/vllm/pull/4986
You should install vllm from source.

@farzanehnakhaee70 we have support in mistral.rs with multi batch, in situ quantization, and Python, OpenAI, and other APIs: https://github.com/EricLBuehler/mistral.rs/blob/master/docs%2FPHI3V.md

@EricB Thanks for the implementation, however it seem to a bit slow even with using isq. It takes the same amount of time when using just transformers library. Is ther something I'm missing?

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment