Ornn generating vector with 768 dimensions instead of 512
Hello, i'm using spring ai in a spring boot + java application to transform a text in vector and then make a query in opensearch. The problem is: the model is generating a vector with 768 dimensions instead of 512. Here's my configuration for give more context:
spring:
ai:
embedding:
transformer:
onnx:
model-uri: https://huggingface.co/sentence-transformers/distiluse-base-multilingual-cased-v1/resolve/main/onnx/model.onnx
tokenizer:
uri: https://huggingface.co/sentence-transformers/distiluse-base-multilingual-cased-v1/resolve/main/tokenizer.json
I also tried with model_01, model_02, model_03 and model_04, same problem.
I think it has something to do with the down projection as mentioned here: https://github.com/UKPLab/sentence-transformers/issues/247#issuecomment-633845810
But I don't know, how to solve this or what to do. Maybe someone else can help?
UPDATE 2: 2025-05-20
Sorry, my approach under UPDATE 1: 2025-05-20 does not work with the provided model.onnx under https://huggingface.co/sentence-transformers/distiluse-base-multilingual-cased-v1/tree/main/onnx. But it works with a model exported with optimum-cli used with python:
python -m pip install optimum[onnxruntime]
python -m pip install sentence-transformers
optimum-cli export onnx -m sentence-transformers/distiluse-base-multilingual-cased-v1 --task feature-extraction any/folder/you/want
This creates config.json,model.onnx,special_tokens_map.json,tokenizer_config.json,tokenizer.json and vocab.txtunder the specified folder any/folder/you/want.
UPDATE 1: 2025-05-20
I use python not java, but I found out, that the model.run(...) function produces a list with two elements.
import onnxruntime as ort
onnx_input = {
"input_ids": ...,
"attention_mask": ...
}
model = ort.InferenceSession(f"path/to/onnx/model.onnx")
model_output = model.run(None, onnx_input) # model_output is a list with two arrays. First array has shape of (n, 768), second has shape of (n, 512)
print(model_output[0].shape) # -> (n, 768)
print(model_output[1].shape) # -> (n, 512)
I compared the embeddings from model_output[1] with the embeddings using SentenceTransformers with the following script from https://github.com/huggingface/optimum/issues/1519#issuecomment-1845696365 under "Compare both original and onnx model output" and they were correct.