Ornn generating vector with 768 dimensions instead of 512

by lazarim - opened Mar 6

Mar 6

Hello, i'm using spring ai in a spring boot + java application to transform a text in vector and then make a query in opensearch. The problem is: the model is generating a vector with 768 dimensions instead of 512. Here's my configuration for give more context:

spring:
  ai:
    embedding:
      transformer:
        onnx:
          model-uri: https://huggingface.co/sentence-transformers/distiluse-base-multilingual-cased-v1/resolve/main/onnx/model.onnx
        tokenizer:
          uri: https://huggingface.co/sentence-transformers/distiluse-base-multilingual-cased-v1/resolve/main/tokenizer.json

I also tried with model_01, model_02, model_03 and model_04, same problem.

Camisole8280

May 19

•

edited May 20

I think it has something to do with the down projection as mentioned here: https://github.com/UKPLab/sentence-transformers/issues/247#issuecomment-633845810

But I don't know, how to solve this or what to do. Maybe someone else can help?

UPDATE 2: 2025-05-20
Sorry, my approach under UPDATE 1: 2025-05-20 does not work with the provided model.onnx under https://huggingface.co/sentence-transformers/distiluse-base-multilingual-cased-v1/tree/main/onnx. But it works with a model exported with optimum-cli used with python:

python -m pip install optimum[onnxruntime]
python -m pip install sentence-transformers

optimum-cli export onnx -m sentence-transformers/distiluse-base-multilingual-cased-v1 --task feature-extraction any/folder/you/want

This creates config.json,model.onnx,special_tokens_map.json,tokenizer_config.json,tokenizer.json and vocab.txtunder the specified folder any/folder/you/want.

UPDATE 1: 2025-05-20
I use python not java, but I found out, that the model.run(...) function produces a list with two elements.

import onnxruntime as ort

onnx_input = {
                "input_ids": ...,
                "attention_mask": ...
            }
model = ort.InferenceSession(f"path/to/onnx/model.onnx")
model_output = model.run(None, onnx_input) # model_output is a list with two arrays. First array has shape of (n, 768), second has shape of (n, 512)
print(model_output[0].shape) # ->  (n, 768)
print(model_output[1].shape) # -> (n, 512)

I compared the embeddings from model_output[1] with the embeddings using SentenceTransformers with the following script from https://github.com/huggingface/optimum/issues/1519#issuecomment-1845696365 under "Compare both original and onnx model output" and they were correct.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment