Venketh

venketh

AI & ML interests

None yet

Recent Activity

Organizations

None yet

venketh's activity

replied to bartowski's post about 1 month ago
view reply

Would also be helpful for gpt4all, since Q4_0, Q4_1, and FP16 are our only options there.

reacted to bartowski's post with πŸ‘β€οΈ 6 months ago
view post
Post
10074
So turns out I've been spreading a bit of misinformation when it comes to imatrix in llama.cpp

It starts true; imatrix runs the model against a corpus of text and tracks the activation of weights to determine which are most important

However what the quantization then does with that information is where I was wrong.

I think I made the accidental connection between imatrix and exllamav2's measuring, where ExLlamaV2 decides how many bits to assign to which weight depending on the goal BPW

Instead, what llama.cpp with imatrix does is it attempts to select a scale for a quantization block that most accurately returns the important weights to their original values, ie minimizing the dequantization error based on the importance of activations

The mildly surprising part is that it actually just does a relatively brute force search, it picks a bunch of scales and tries each and sees which one results in the minimum error for weights deemed important in the group

But yeah, turns out, the quantization scheme is always the same, it's just that the scaling has a bit more logic to it when you use imatrix

Huge shoutout to @compilade for helping me wrap my head around it - feel free to add/correct as well if I've messed something up
Β·
New activity in NousResearch/GPT4-x-Vicuna-13b-fp16 11 months ago

IQ4_NL

#1 opened 12 months ago by
venketh
New activity in athirdpath/Psyonic_Sydney-20b-GGUF 11 months ago

quants

#1 opened 11 months ago by
venketh
New activity in naxautify/gpt2-4k over 1 year ago

How did you train / FT this?

#2 opened over 1 year ago by
venketh