Is a higher quant possible?
#1
by
jackboot
- opened
Something in between 4-5 bits? Or is the FP16 completely janky due to being dequantized?
I should think if we had the measurement.json we could just quant it to a different size. Exl2 needs a monkeypatch to skip a sanity check for the quant to run though.