Request for my models ;

#498
by DavidAU - opened

Hey Guys;

I want to ask a favor - do you mind if I copy over the reg quants repo for my models and attribute them to you ?
(reg, not imatrix) ; and also link to your IMAT versions at your repo?

The main reason: Long detailed usage page with examples ; 2nd back up... 3rd ... slow upload speeds.
This will also allow me to focus on creating models / uploading source versions exclusively instead of source + ggufs.

Kinda went a little "MOE" crazy... built and uploaded 9 in the past 5 days ; several more in the pipe ready to upload too.

I uploaded 3 (moe) source today, and a 4th is uploading... I can see 2 in your que right now - thank you!

( #3 https://huggingface.co/DavidAU/L3-MOE-4x8B-Dark-Planet-Rebel-FURY-25B - just completed uploaded an hour ago )
( #4 https://huggingface.co/DavidAU/L3-MOE-4X8B-Grand-Horror-25B - this one is still uploading at my end. )

Thanks again for all you do.
If this is an issue, no worries -;

If you mean you want to somehow copy/mirror quants, sure be my guest, neither legally nor morally nor ethically is there any reason not to :) I don't care for attribution, either, but it's a service for the people who come after us to have a track record, of course - but it doesn't have to be prominent or so, just exist so people who look can find it.

We are, btw., trying to consolidate downloading and overview a bit, example page: https://hf.tst.eu/model#L3-DARKEST-PLANET-16.5B-i1-GGUF

That way, people can see all quants (both static and imatrix in one table), sort them, conveniently download multipart quants, and see the original README all in a single page. We plan to put a prominent link to that page on every model card, but due to real world problems, this is delayed.

As for your models models, sure, keep them coming (and feel free to point out models I might have overlooked). I forgot which model it was, but you kept me waiting for almost a whole day for the complete upload (I queued it before it was complete :)

Also, yes, your output volume in the past days was not unnoticed :)

mradermacher changed discussion status to closed

Excellent. That is wonderful.

RE: Att ; still will add, you guys deserve the credit.

Likewise I can integrate the imatrix versions into the model card and offer users Imatrix, and how to use the imatrix effectively (and of course what quants relative to parameter etc etc). On a personal note I can also download your imats too ; as there are new issues with CUDA/Imatrixing which is adding boatloads of time to local imating.

I will be building a lot more MOEs (are larger ones too) going forward.

RE: MOES:
I will likely be cloning a few source models as well, so if you see the same model with "MIN,MAX" in the name these are the same MOE model, with the experts (default) changed.

Here is why and how MOES are getting a raw deal:

1 - Some llm/ai apps allow fairly easy access to change the MOEs used at model loading time, but many do not.
Case in point: I had to figure out a "hack" to get llama-server.exe to change the MOE used. (gonna be a ticket about that!!!)
Many others have no way ... locking the user out of the full power of the model.
(IE: Backyard, Ollama, Koboldcpp (ticket in with them))

So I will be cloning the source repo with an altered config.json (I actually put how to do this on the model card of the source moe models) , this way
new ggufs/imats will be created with the number of experts "hard coded" .
I will also be writing up a doc on how to do this using some online tools (manually) too.

2 - Most, if not all MOES right now... default to 2 experts. This is not the optimum setup, especially with certain types of MOEs.
In fact a few papers indicate the min optimum is 3, not 2.
With a 4x (moe); you are running at "50% power", and considering issue #1 above, most people can either not activate "more power" or won't or ... are unaware.
With 6x or 8x ... 33% power, 25% power... 10x ... 20% ... you get the idea.

This problem is compounded further with some MOE setups (at build time) which are just plain wrong / lazy.
And few take advantage of "like function" models (vs say a coder, storyteller, general, medical) being altogether in a moe.

3 - Impact on the user
For certain configs (actually the easiest MOEs to build), power and performance go up drastically as more experts are "brought online".
For creative - nuance, emotion, and detail jump.
(case in point: https://huggingface.co/DavidAU/Mistral-MOE-4X7B-Dark-MultiVerse-24B-GGUF )
(case in point (set at 4 experts): https://huggingface.co/DavidAU/L3-MOE-8X8B-Dark-Planet-8D-Mirrored-Chaos-47B)

For logic/problem solving - abilities appear, ie a "riddle" or "problem" with a poor or non solution at experts 2, becomes solved at 3 experts.
( case in point: https://huggingface.co/DavidAU/Llama-3.2-4X3B-MOE-Ultra-Instruct-10B-GGUF )

Likewise instruction following performance increases with more experts too.
Users are being cheater of a moes raw power, likewise the user "experience" with a moe is less that it could be ... affecting MOE expected performance and open source in general.

4 - VRAM/T/S savings:

You can run a "64B" model at "47B" MOE VRAM/T/S with and control the power levels ie: run a 8X8B at 6, 7 or 8 experts.
Imagine a 4X70B ; 280B "normally" , would be roughly 210B - at 2 experts that would be great, 3 awesome... 4... MIGHT BE SKYNET.

RE: Waiting a whole day - ; sorry about that- user error on my end... letting it run overnight !

Sign up or log in to comment