numen-tech/Meta-Llama-3.1-70B-Instruct-w4a16g128asym · Will we see a w3 this go-round?

Jul 30

Can't thank you enough for keeping the Omniquant...quant. alive! And the great app, of course, finally got around to using it.

As the titular says, though honestly I'd personally be much more excited about a w3 Mistral-Large2407 since that'll just give enough wiggle room for my MBP M1 Max to make use of it 🥳

numen-tech

Owner Aug 9

It gets better than that! I'd recommend taking a look at the w2g64 version of Mistral-Large-Instruct-2407 here.

nb: EfficientQAT literally is OmniQuant v2. :)

BuildBackBuehler

Aug 10

It gets better than that! I'd recommend taking a look at the w2g64 version of Mistral-Large-Instruct-2407 here.

nb: EfficientQAT literally is OmniQuant v2. :)

Hahaha pretty quick on the trigger there but not quick enough for me! I'd happened to stumble upon our prophet (or at least mine 🤪)'s legendary work on the Reddits a couple days ago. Though can't celebrate entirely yet considering GPTQ isn't Apple-friendly. However Triton is now Metal/MPS-compatible...enough to be used. Did so in order to run AQLM 2-bit previously. Still find my head spinning over how to get this running in any case. I hope y'all figure out/intend to figure out how to!

There's also another ML dev. I'm a groupie of who apparently threw together this piece of code in his spare time. https://gist.github.com/philipturner/4ad866cf537daaedc033acf18e29d65d I reached out to him but I doubt I can make any use of it as-is, but I figure y'all can/could. Donno the inner-machinations of PrivateLLM but it may pay to look @ his other work, apparently his Metal Flash Attn. Project outpaces MPS.

https://github.com/philipturner/metal-flash-attention

So I'll come out and just say, if PrivateLLM wants a competitive edge by getting input from a hobbyist obsessed with the bleeding-edge solutions for quid pro quo talk-me-off-the-ledge code integration support, I'm your man & of course avail. on Discord, Matrix, GChat, Linkedin and whatever else. I use PLLM on my phone, but its the wild west on my Mac (PrivateLLM has a ways to go, unless I am able to throw my models up for an OpenAI API-compatible server so I can incorporate Aider, K.R.A.G.E.N., RAG w/ Knowledge Graphs), AI Agents....ad infinitum/nauseam, then I've been dumb to not!)