Support for quantized cache

#5
by dragstoll - opened

Hi
Is it possible to use quantized cache with this model?
It tried to use it with KV Cache Quantization:
cache_implementation="quantized",
cache_config={"nbits": 4, "backend": "quanto"},

But getting an error: This model does not support the quantized cache. If you want your model to support quantized cache, please open an issue.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment