File size: 997 Bytes
e1a7751 181c898 e1a7751 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
---
tags:
- vllm
- int4
language:
- en
- de
- fr
- it
- pt
- hi
- es
- th
pipeline_tag: image-text-to-text
license: apache-2.0
library_name: vllm
base_model:
- mistral-community/pixtral-12b
- mistralai/Pixtral-12B-2409
base_model_relation: quantized
---
# Pixtral-12B-2409: int4 Weight Quant
W4A16 quant of [mistral-community/pixtral-12b](https://huggingface.co/mistral-community/pixtral-12b) using [kylesayrs/gptq-partition branch of LLM Compressor](https://github.com/vllm-project/llm-compressor/tree/kylesayrs/gptq-partition) for optimised inference on VLLM.
vision_tower kept at FP16. language_model weights quantized to 4bit.
Calibrated on 512 flickr samples.
Example VLLM usage
```
vllm serve nintwentydo/pixtral-12b-2409-W4A16-G128 --max-model-len 131072 --limit-mm-per-prompt 'image=4'
```
If you want a more advanced/fully featured chat template you can use [this jinja template](https://raw.githubusercontent.com/nintwentydo/tabbyAPI/refs/heads/main/templates/pixtral12b.jinja) |