How-To-Set-and-Manage-MOE-Mix-of-Experts-Model-Activation-of-Experts

This document discusses how to set/change the Mixture of Experts in various LLM/AI apps and includes links to additional MOE Models, and other helpful resources.


LINKS:


Mixture Of Expert Models - including Reasoning/Thinking - GGUF:

[ https://huggingface.co/collections/DavidAU/d-au-moe-mixture-of-experts-models-see-also-source-coll-67579e54e1a2dd778050b928 ]

All Models Source Code - For GGUFs, AWQ, HQQ, GPTQ, EXL2 and direct usage - including MOEs:

[ https://huggingface.co/collections/DavidAU/d-au-source-files-for-gguf-exl2-awq-gptq-hqq-etc-etc-66b55cb8ba25f914cbf210be ]

Additional:

#1 All Reasoning/Thinking Models - including MOEs - (collection) (GGUF):

[ https://huggingface.co/collections/DavidAU/d-au-reasoning-deepseek-models-with-thinking-reasoning-67a41ec81d9df996fd1cdd60 ]

#2 All Reasoning/Thinking Models - including MOES - (collection) (Source Code to generation GGUF, EXL2, AWQ, GPTQ, HQQ, etc etc and direct usage):

[ https://huggingface.co/collections/DavidAU/d-au-reasoning-source-files-for-gguf-exl2-awq-gptq-67b296c5f09f3b49a6aa2704 ]

#3 All Adapters (collection) - Turn a "regular" model into a "thinking/reasoning" model:

[ https://huggingface.co/collections/DavidAU/d-au-reasoning-adapters-loras-any-model-to-reasoning-67bdb1a7156a97f6ec42ce36 ]

These collections will update over time. Newest items are usually at the bottom of each collection.


Main Document - Setting Mixture Of Experts in LLM/AI apps


Experts Activation / Models used to build this model:

The mixture of experts can be set at 1, 2, 4, 8 ... or more experts, but you can use 1, 2, 3, or 4.

This "team" has a Captain (first listed model), and then all the team members contribute to the to "token" choice billions of times per second. Note the Captain also contributes too.

Think of 2, 3 or 4 (or more) master chefs in the kitchen all competing to make the best dish for you.

This results in higher quality generation.

This also results in many cases in higher quality instruction following too.

That means the power of every model is available during instruction and output generation.

NOTE:

You can use one "expert" too ; however this means the model will randomly select an expert to use EACH TIME, resulting in very different generation for each prompt / regen of a prompt.

CHANGING THE NUMBER OF EXPERTS:

You can set the number of experts in LMStudio (https://lmstudio.ai) at the "load" screen and via other apps/llm apps by setting "Experts" or "Number of Experts".

For Text-Generation-Webui (https://github.com/oobabooga/text-generation-webui) you set the number of experts at the loading screen page.

For KolboldCPP (https://github.com/LostRuins/koboldcpp) Version 1.8+ , on the load screen, click on "TOKENS", you can set experts on this page, and the launch the model.

For server.exe / Llama-server.exe (Llamacpp - https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md ) add the following to the command line to start the "llamacpp server" (CLI):

"--override-kv llama.expert_used_count=int:3"

(no quotes, where "3" is the number of experts to use)

When using "API", you set the "num_experts_used" in the JSON payload (this maybe different for different back ends).

SUGGESTION:

The MOE models at my repo:

[ https://huggingface.co/collections/DavidAU/d-au-moe-mixture-of-experts-models-see-also-source-coll-67579e54e1a2dd778050b928 ]

Contain various examples, including example generation(s) showing 2, 4, and 8 experts.

This will give you a better idea of what changes to expect when adjusting the number of experts and the effect on generation.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Collections including DavidAU/How-To-Set-and-Manage-MOE-Mix-of-Experts-Model-Activation-of-Experts