vllm

  • For 24GB VRAM
    • max-model-len: <4096 (marlin_awq) - not available
    • max-model-len: 10240 (2048 + 8192) (awq)
vllm serve werty1248/Qwen2.5-32B-s1.1-Ko-Native-AWQ --max-model-len 10240 --quantization awq --dtype half --port 8000 --gpu-memory-utilization 0.99 --enforce_eager
Downloads last month
5
Safetensors
Model size
5.73B params
Tensor type
I32
BF16
FP16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for werty1248/Qwen2.5-32B-s1.1-Ko-Native-AWQ

Base model

Qwen/Qwen2.5-32B
Quantized
(1)
this model