A Mistral-7B pruned50 with Marlin Kernel and AutoGPTQ

Please see my tutorial to execute this model:

https://vilsonrodrigues.medium.com/sparse-quantize-and-serving-llms-with-neuralmagic-autogptq-and-vllm-03961b72ec3a

Downloads last month
30
Safetensors
Model size
1.14B params
Tensor type
I32
·
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.