--- tags: - vllm - int4 language: - en - de - fr - it - pt - hi - es - th pipeline_tag: image-text-to-text license: apache-2.0 library_name: vllm base_model: - mistral-community/pixtral-12b - mistralai/Pixtral-12B-2409 base_model_relation: quantized --- # Pixtral-12B-2409: int4 Weight Quant W4A16 quant of [mistral-community/pixtral-12b](https://huggingface.co/mistral-community/pixtral-12b) using [kylesayrs/gptq-partition branch of LLM Compressor](https://github.com/vllm-project/llm-compressor/tree/kylesayrs/gptq-partition) for optimised inference on VLLM. vision_tower kept at FP16. language_model weights quantized to 4bit. Calibrated on 512 flickr samples. Example VLLM usage ``` vllm serve nintwentydo/pixtral-12b-2409-W4A16-G128 --max-model-len 131072 --limit-mm-per-prompt 'image=4' ``` If you want a more advanced/fully featured chat template you can use [this jinja template](https://raw.githubusercontent.com/nintwentydo/tabbyAPI/refs/heads/main/templates/pixtral12b.jinja)