How would one serve this model using vllm?

#1
by bjodah - opened

Hi! I'm new to the world of LLM, so I do apologize beforehand if this there is some silly misunderstanding on my part here, but I tried to host this model (the 4bit quant) locally in an OCI-container on a machine with a RTX 3090 (24GB vram).

I passed these flags to vllm: --model unsloth/Mistral-Small-24B-Instruct-2501-unsloth-bnb-4bit --dtype bfloat16 --load_format bitsandbytes --quantization bitsandbytes

But I got an assertion error on mismatching shapes of param_data and loaded_weight (vllm implementation detail), upon googling for the issue, I saw a similar issue reported on the vllm github issues page:
https://github.com/vllm-project/vllm/issues/12682

Unsloth AI org

Hi! I'm new to the world of LLM, so I do apologize beforehand if this there is some silly misunderstanding on my part here, but I tried to host this model (the 4bit quant) locally in an OCI-container on a machine with a RTX 3090 (24GB vram).

I passed these flags to vllm: --model unsloth/Mistral-Small-24B-Instruct-2501-unsloth-bnb-4bit --dtype bfloat16 --load_format bitsandbytes --quantization bitsandbytes

But I got an assertion error on mismatching shapes of param_data and loaded_weight (vllm implementation detail), upon googling for the issue, I saw a similar issue reported on the vllm github issues page:
https://github.com/vllm-project/vllm/issues/12682

Currently dynamic quants arent supported but will be soon, You can serve this standard Bnb one instead: https://huggingface.co/unsloth/Mistral-Small-24B-Instruct-2501-bnb-4bit

shimmyshimmer changed discussion status to closed
shimmyshimmer changed discussion status to open

Sign up or log in to comment