Venketh's picture

8 5

Venketh

venketh

·

AI & ML interests

None yet

Recent Activity

replied to bartowski's post about 1 month ago

Looks like Q4_0_N_M file types are going away Before you panic, there's a new "preferred" method which is online (I prefer the term on-the-fly) repacking, so if you download Q4_0 and your setup can benefit from repacking the weights into interleaved rows (what Q4_0_4_4 was doing), it will do that automatically and give you similar performance (minor losses I think due to using intrinsics instead of assembly, but intrinsics are more maintainable) You can see the reference PR here: https://github.com/ggerganov/llama.cpp/pull/10446 So if you update your llama.cpp past that point, you won't be able to run Q4_0_4_4 (unless they add backwards compatibility back), but Q4_0 should be the same speeds (though it may currently be bugged on some platforms) As such, I'll stop making those newer model formats soon, probably end of this week unless something changes, but you should be safe to download and Q4_0 quants and use those ! Also IQ4_NL supports repacking though not in as many shapes yet, but should get a respectable speed up on ARM chips, PR for that can be found here: https://github.com/ggerganov/llama.cpp/pull/10541 Remember, these are not meant for Apple silicon since those use the GPU and don't benefit from the repacking of weights

updated a model about 1 month ago

venketh/Qwen2.5-Coder-14B-Instruct-gguf

updated a model about 1 month ago

venketh/Qwen2.5-Coder-7B-Instruct-gguf

View all activity

Organizations

None yet

venketh's activity

replied to bartowski's post about 1 month ago

Would also be helpful for gpt4all, since Q4_0, Q4_1, and FP16 are our only options there.

updated 2 models about 1 month ago

venketh/Qwen2.5-Coder-14B-Instruct-gguf

Updated Jan 16 • 56

venketh/Qwen2.5-Coder-7B-Instruct-gguf

Updated Jan 16 • 65

published 2 models about 1 month ago

venketh/Qwen2.5-Coder-14B-Instruct-gguf

Updated Jan 16 • 56

venketh/Qwen2.5-Coder-7B-Instruct-gguf

Updated Jan 16 • 65

New activity in bartowski/Mistral-Nemo-Instruct-2407-GGUF 5 months ago

Upload Mistral-Nemo-Instruct-2407-Q4_0.gguf

#5 opened 6 months ago by

updated 2 models 6 months ago

venketh/Mistral-7B-Instruct-v0.3-gguf

Updated Aug 29, 2024 • 8

venketh/Mistral-Nemo-Instruct-2407-gguf

Updated Aug 28, 2024 • 9

reacted to bartowski's post with 👍❤️ 6 months ago

Post

10074

So turns out I've been spreading a bit of misinformation when it comes to imatrix in llama.cpp

It starts true; imatrix runs the model against a corpus of text and tracks the activation of weights to determine which are most important

However what the quantization then does with that information is where I was wrong.

I think I made the accidental connection between imatrix and exllamav2's measuring, where ExLlamaV2 decides how many bits to assign to which weight depending on the goal BPW

Instead, what llama.cpp with imatrix does is it attempts to select a scale for a quantization block that most accurately returns the important weights to their original values, ie minimizing the dequantization error based on the importance of activations

The mildly surprising part is that it actually just does a relatively brute force search, it picks a bunch of scales and tries each and sees which one results in the minimum error for weights deemed important in the group

But yeah, turns out, the quantization scheme is always the same, it's just that the scaling has a bit more logic to it when you use imatrix

Huge shoutout to @compilade for helping me wrap my head around it - feel free to add/correct as well if I've messed something up

5 replies

·

New activity in MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF 10 months ago

IQ1_S or IQ_M for low RAM/VRAM computers

#20 opened 10 months ago by

Were all the quantizations produced w/ importance matrices?

#19 opened 10 months ago by

New activity in NousResearch/GPT4-x-Vicuna-13b-fp16 11 months ago

IQ4_NL

#1 opened 12 months ago by

New activity in athirdpath/Psyonic_Sydney-20b-GGUF 11 months ago

quants

#1 opened 11 months ago by

liked 2 models about 1 year ago

KnutJaegersberg/GPT-JX-3b

Text Generation • Updated Dec 9, 2023 • 111 • 3

AdaptLLM/law-LLM-13B

Text Generation • Updated Dec 2, 2024 • 259 • 33

liked 3 datasets about 1 year ago

armanc/scientific_papers

Updated Jan 18, 2024 • 944 • 155

kmfoda/booksum

Viewer • Updated Nov 30, 2022 • 12.5k • 949 • 54

DKYoon/SlimPajama-6B

Viewer • Updated Aug 21, 2023 • 5.51M • 3.02k • 42

New activity in naxautify/gpt2-4k over 1 year ago

How did you train / FT this?

#2 opened over 1 year ago by