juampahc/gliner_multi-v2.1-openvino

Hello,

I have been trying to serve an onnx quantized version of Gliner on OpenVINO, both the Onnx itself and the IR version, however in either case, it fails. For the Onnx model, When I send a request to OVMS, OpenVINO logs show that there are unsupported operations, hence model loading fails. For the IR format, when I try to do inference, the model expects weird inputs that are not a part of the preprocessing pipeline output.

So, could you share what you did, or give hints that could help?

Hello!
Sorry for taking so long to reply, I have been really busy.
I am preparing a github repo for different ways of GLiNER deployment (docker containers, kubernetes, etc..) but as I said, took more than expected.
If you are still trying to manage, here is the basic:

Choose your serving strategy: for simplicity, let's use FastAPI approach.
Take a look to the original python package available at: https://github.com/urchade/GLiNER
It already has all you need to load an ONNX version of the model in a python script (latest changes include GPU with ONNX files). However, if you want to try with openvino you will need to rewrite some classes (by subclassing them). Basically, you need to maintain all the preprocessing and postprocessing available in methods like this one: https://github.com/urchade/GLiNER/blob/main/gliner/model.py#L183 (GLiNER.prepare_model_inputs). These methods will help you prepare the dictionary of tensors that are expected in both the ONNX file and IR version.
Hope it helps! If not, feel free to ask your doubts or comments here.

juampahc
/

gliner_multi-v2.1-openvino

Conversion Approach