L3-MOE-4X8B-Grand-Horror-25B

This repo contains the full precision source code, in "safe tensors" format to generate GGUFs, GPTQ, EXL2, AWQ, HQQ and other formats. The source code can also be used directly.

NOTE: Links to GGUFs below.

About This Model:

This model is based on the original "Llama 3 Dark Planet 8B" (GGUF / SOURCE) - which contains 3 different models and I have added "Gutenberg 8B" [https://huggingface.co/nbeerbower/llama-3-gutenberg-8B] as the forth model for this MOE.

This model contains FOUR different 8B models in a MOE model at 25B, equal to 4X8B - 32B parameters.

SIDE NOTE:

Uusually a "MOE" is constructed with different models, to give the "moe model" some of the best of each (or not) during generation.

I felt turning this concept on its head was better for creative use cases.

I.E:

All the "chefs" in the kitchen went to the same elite cooking school, got the highest marks, and now all work together to make the very best "dish of tokens" they can every time.

POWER UP? or DOWN?

You can change the number of experts (models) activated inside many LLM/AI apps.

Turning it up increases quality, nuance and depth but at the same time the tokens per second drops accordingly.

You can use 1 expert for "draft mode", and then move up in experts to get to final draft.

Also note instruction following will also increase as you up the number of experts too.

Quant choice will also affect overall quality => higher is better, however even at the lowest quant level, this model will perform strongly.

MOE SPECIFIC NOTES:

If you want to change the "default" number of experts set, modify the "config.json" :

"num_experts_per_tok": 2,

The user will still be able to modify it, if the LLM/AI app has the setting option to do this.

Each time you add/subtract an expert the token per second speed will change.

( this model is set at 2 out of 4 experts active, more experts => greater quality. )

IMPORTANT: Highest Quality Settings / Optimal Operation Guide / Parameters and Samplers

If you are going to use this model, (source, GGUF or a different quant), please review this document for critical parameter, sampler and advance sampler settings (for multiple AI/LLM aps).

This a "Class 1" (settings will enhance operation) model:

For all settings used for this model (including specifics for its "class"), including example generation(s) and for advanced settings guide (which many times addresses any model issue(s)), including methods to improve model performance for all use case(s) as well as chat, roleplay and other use case(s) (especially for use case(s) beyond the model's design) please see:

[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]

REASON:

Regardless of "model class" this document will detail methods to enhance operations.

If the model is a Class 3/4 model the default settings (parameters, samplers, advanced samplers) must be set for "use case(s)" uses correctly. Some AI/LLM apps DO NOT have consistant default setting(s) which result in sub-par model operation. Like wise for Class 3/4 models (which operate somewhat to very differently than standard models) additional samplers and advanced samplers settings are required to "smooth out" operation, AND/OR also allow full operation for use cases the model was not designed for.

BONUS - Use these settings for ANY model, ANY repo, ANY quant (including source/full precision):

This document also details parameters, sampler and advanced samplers that can be use FOR ANY MODEL, FROM ANY REPO too - all quants, and of course source code operation too - to enhance the operation of any model.

[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]

NOTE:

I strongly suggest you also visit the DavidAU GGUF (below) repo too for more details in using this model ; especially if it is "Class 3" or "Class 4" to get maximum performance from the model.

For full information about this model, including:

  • Details about this model and its use case(s).
  • Context limits
  • Special usage notes / settings.
  • Any model(s) used to create this model.
  • Template(s) used to access/use this model.
  • Example generation(s)
  • GGUF quants of this model

Please go to:

[ https://huggingface.co/DavidAU/L3-MOE-4X8B-Grand-Horror-25B-GGUF ]


Quants by Team "Mradermacher":

GGUFS:

[ https://huggingface.co/mradermacher/L3-MOE-4X8B-Grand-Horror-25B-GGUF ]

IMATRIX GGUFS:

[ https://huggingface.co/mradermacher/L3-MOE-4X8B-Grand-Horror-25B-i1-GGUF ]

Downloads last month
23
Safetensors
Model size
24.9B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for DavidAU/L3-MOE-4X8B-Grand-Horror-25B

Quantizations
3 models

Collection including DavidAU/L3-MOE-4X8B-Grand-Horror-25B