ehartford/WizardLM-7B-Uncensored quantized to 8bit GPTQ with group size 128 + true sequential, no act order.
For most uses this probably isn't what you want.
For 4bit GPTQ quantizations see TheBloke/WizardLM-7B-uncensored-GPTQ
Quantized using AutoGPTQ with the following config:
config: dict = dict(
quantize_config=dict(model_file_base_name='WizardLM-7B-Uncensored',
bits=8, desc_act=False, group_size=128, true_sequential=True),
use_safetensors=True
)
See quantize.py
for the full script.
Tested for compatibility with:
- WSL with GPTQ-for-Llama
triton
branch.
AutoGPTQ loader should read configuration from quantize_config.json
.
For GPTQ-for-Llama use the following configuration when loading:
wbits: 8
groupsize: 128
model_type: llama
- Downloads last month
- 9
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.