nintwentydo
/

pixtral-12b-2409-2of4-sparse

Image-Text-to-Text

compressed-tensors

Model card Files Files and versions Community

nintwentydo commited on 7 days ago

Commit

c60426f

·

verified ·

1 Parent(s): 2371c2d

Create README.md

Files changed (1) hide show

README.md +33 -0

README.md ADDED Viewed

	@@ -0,0 +1,33 @@

+---
+tags:
+- vllm
+- sparsity
+language:
+- en
+- de
+- fr
+- it
+- pt
+- hi
+- es
+- th
+pipeline_tag: image-text-to-text
+license: apache-2.0
+library_name: vllm
+base_model:
+- mistral-community/pixtral-12b
+- mgoin/pixtral-12b
+- mistralai/Pixtral-12B-2409
+base_model_relation: quantized
+---
+# Pixtral-12B-2409: 2:4 sparse
+2:4 sparse version of [mistral-community/pixtral-12b](https://huggingface.co/mgoin/pixtral-12b) using [kylesayrs/gptq-partition branch of LLM Compressor](https://github.com/vllm-project/llm-compressor/tree/kylesayrs/gptq-partition) for optimised inference on VLLM.
+Example VLLM usage
+```
+vllm serve nintwentydo/pixtral-12b-2409-2of4-sparse --max-model-len 131072 --limit-mm-per-prompt 'image=4'
+```
+If you want a more advanced/fully featured chat template you can use [this jinja template](https://raw.githubusercontent.com/nintwentydo/tabbyAPI/refs/heads/main/templates/pixtral12b.jinja)