Pixtral-12B-2409: 2:4 sparse

2:4 sparse version of mistral-community/pixtral-12b using kylesayrs/gptq-partition branch of LLM Compressor for optimised inference on VLLM.

Example VLLM usage

vllm serve nintwentydo/pixtral-12b-2409-2of4-sparse --max-model-len 131072 --limit-mm-per-prompt 'image=4' 

If you want a more advanced/fully featured chat template you can use this jinja template

Downloads last month
93
Safetensors
Model size
12.7B params
Tensor type
BF16
·
Inference Examples
Inference API (serverless) does not yet support vllm models for this pipeline type.

Model tree for nintwentydo/pixtral-12b-2409-2of4-sparse

Quantized
(5)
this model