WizardLM-2-4x7B-MoE-exl2-3_0bpw

This is a quantized version of WizardLM-2-4x7B-MoE an experimental MoE model made with Mergekit. Quantization was done using version 0.0.18 of ExLlamaV2.

Please be sure to set experts per token to 4 for the best results! Context length should be the same as Mistral-7B-Instruct-v0.1 (8k tokens). For instruction templates, Vicuna-v1.1 is recommended.

For more information see the original repository.

Downloads last month
11
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.