The speed of obtaining embeddings on CPU/GPU

#3
by hiauiarau - opened

Hello, could you please tell me the embedding generation speed on CPU/GPU in FP16/FP32, and how much does it increase compared to BAAI/bge-m3? Also, is it possible to obtain the model in ONNX format?

In fp16 it's 10 times slower than bge-m3.

Sign up or log in to comment