Is the Q8_0 quant also imatrix'd? Why?
What was the basis of the decision to use imatrix vs. regular quantization for Q8_0? Doesn't imatrix reduce performance?
It shouldn't reduce performance (unless you have a source on that) but it also should not affect it much if at all, since at Q8 there's no need to compress portions further than others
Edit: this is based on old knowledge, if you come across this the real answer is that Q8 quants completely disable the imatrix, as you can see here:
Was just pointed at this discussion - imatrix data is not used for Q8_0 quants, so the resulting quant will be essentially the same regardless of whether an imatrix is specified or not (the only thing that changes is the header values saying which imatrix file is used etc., not the actual model data). Q6_K is the highest-bpw tensor format that uses the imatrix data for quantisation.
Yeah this is a very old comment for me, I learned since then that in the code itself on Q8 it explicitly disabled the imatrix before doing the quantization
updated the comment in case others come across it