QwQ-32B-Preview-bnb-4bit

Introduction

QwQ-32B-Preview-bnb-4bit is a 4-bit quantized version of the QwQ-32B-Preview model, utilizing the Bits and Bytes (bnb) quantization technique. This quantization significantly reduces the model's size and inference latency, making it more accessible for deployment on resource-constrained hardware.

Model Details

  • Quantization: 4-bit using Bits and Bytes (bnb)
  • Base Model: Qwen/QwQ-32B-Preview
  • Parameters: 32.5 billion
  • Context Length: Up to 32,768 tokens
Downloads last month
20
Safetensors
Model size
17.7B params
Tensor type
F32
BF16
U8
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for kurcontko/QwQ-32B-Preview-bnb-4bit

Base model

Qwen/Qwen2.5-32B
Quantized
(116)
this model