imatrix.dat missing output.weight and token_embd.weight

#1
by tdh111 - opened

When making my own quants using your imatrix.dat (thank you guys so much for providing these, for a model like this it would be way too compute intensive for me to do myself, and even on smaller models your dataset from my experience results in better quants than the open source datasets I could find), the output contained:

====== llama_model_quantize_internal: did not find weights for output.weight
[...]
====== llama_model_quantize_internal: did not find weights for token_embd.weight

Looking at the source code shows that happens when it is missing from the imatrix.

I'm just curious if this happened with your imatrix quants, and if so anything you know about why this model has this issue. I've used your imatrix.dat for other models and I'm fairly certain I didn't have this happen.

This can happen when there is not enough measurement coverage for a tensor or when llama.cpp decides to not use measure it, which I think is the case here, i.e. it is almost certainly normal - these tensors will likely be quantized differently (i.e. with more bits).

it is almost certainly normal

Thanks for the confirmation. It takes me 4 hours just to make a quant, so I don't know how many recipes I will try, I know unsloth recipes use a mix of non standard down_proj , embed, and lm_head, on their Q2 mixes, but that's most likely to deal with the fact that Q2 is too small. My current quant is ~4.5 bpw, and I'm thinking of trying to either keep the same size but with better accuracy, or see if I can go down with negligible accuracy which will boost performance.

All the poeer to you :)

Sign up or log in to comment