Can BAAI/bge-reranker-v2-gemma be run quantized?
#3
by
dophys
- opened
Hello, I'm interested in bge-reranker based on gemma. A question is if this model could be run in a quantized form. This would greatly improve inference efficiency and reduce memory requirements.
I used torch to quantize this model (int8), but fragembedding doesn't seem to support running quantized models. Can anyone give me some guidance?