int8 quantization with onnx runtime

#4
by Florianoli - opened

Hi,
I'm currently trying to quantize the model to int8. Somehow the sparse representation are missing in the resulting model. How did you manage to keep the sparse embeddings using int8 quantization?

Thanks!

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment