vllm

For 24GB VRAM
- max-model-len: <4096 (marlin_awq) - not available
- max-model-len: 10240 (2048 + 8192) (awq)

vllm serve werty1248/Qwen2.5-32B-s1.1-Ko-Native-AWQ --max-model-len 10240 --quantization awq --dtype half --port 8000 --gpu-memory-utilization 0.99 --enforce_eager

Downloads last month: 5

Safetensors

Model size

5.73B params

Tensor type

I32

BF16

FP16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for werty1248/Qwen2.5-32B-s1.1-Ko-Native-AWQ

Base model

Qwen/Qwen2.5-32B

Finetuned

Qwen/Qwen2.5-32B-Instruct

Finetuned

werty1248/Qwen2.5-32B-s1.1-Ko-Native

Quantized

(1)

this model