int8 quantization with onnx runtime

by Florianoli - opened 21 days ago

21 days ago

Hi,
I'm currently trying to quantize the model to int8. Somehow the sparse representation are missing in the resulting model. How did you manage to keep the sparse embeddings using int8 quantization?

Thanks!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment