ONNX decoder model uses non-standard operators
The decoder_model_merged.onnx
ONNX model uses non-standard operators which are only available in ONNX Runtime, eg. SkipSimplifiedLayerNormalization
. This means it can't be used with other runtimes. The other models (vision encoder, token embed) are using standard operators. Would it be possible to upload a version of the decoder that uses only standard operators?
@robertknight
Could you share your inference code?
I haven't found any yet for the onnx versions, and this would speed things up a lot.
I don't have working inference code yet. I was testing the models with a tool I use which generates random inputs. There is however a PR in transformers.js which I believe was intended for a previous version of the model - https://github.com/huggingface/transformers.js/pull/1059.
This is as far as I got:
https://gist.github.com/dnhkng/a7e9914e4f039c1063b0b692ae9a87a2
The onnx vision and text models generate the correct embeddings, but I'm not sure how the decoder should work.
I posted some sample code to https://huggingface.co/HuggingFaceTB/SmolVLM-256M-Instruct/discussions/4#67a0e3cf042d0e5936dac100. Hope it helps!