metadata

tags:
  - vllm
  - int4
language:
  - en
  - de
  - fr
  - it
  - pt
  - hi
  - es
  - th
pipeline_tag: image-text-to-text
license: apache-2.0
library_name: vllm
base_model:
  - mistral-community/pixtral-12b
  - mistralai/Pixtral-12B-2409
base_model_relation: quantized

Pixtral-12B-2409: int4 Weight Quant

W4A16 quant of mistral-community/pixtral-12b using kylesayrs/gptq-partition branch of LLM Compressor for optimised inference on VLLM.

vision_tower kept at FP16. language_model weights quantized to 4bit.

Calibrated on 512 flickr samples.

Example VLLM usage

vllm serve nintwentydo/pixtral-12b-2409-W4A16-G128 --max-model-len 131072 --limit-mm-per-prompt 'image=4'

If you want a more advanced/fully featured chat template you can use this jinja template