ehartford/WizardLM-7B-Uncensored quantized to 8bit GPTQ with group size 128 + true sequential, no act order.

For most uses this probably isn't what you want.
For 4bit GPTQ quantizations see TheBloke/WizardLM-7B-uncensored-GPTQ

Quantized using AutoGPTQ with the following config:

config: dict = dict(
    quantize_config=dict(model_file_base_name='WizardLM-7B-Uncensored',
                         bits=8, desc_act=False, group_size=128, true_sequential=True),
    use_safetensors=True
)

See quantize.py for the full script.

Tested for compatibility with:

  • WSL with GPTQ-for-Llama triton branch.

AutoGPTQ loader should read configuration from quantize_config.json.
For GPTQ-for-Llama use the following configuration when loading:
wbits: 8
groupsize: 128
model_type: llama

Downloads last month
9
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.