vilsonrodrigues
/

OpenHermes-2.5-Mistral-7B-Pruned50-GPTQ-Marlin

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

OpenHermes-2.5-Mistral-7B-Pruned50-GPTQ-Marlin

File size: 280 Bytes

a612659

---
license: apache-2.0
tags:
- marlin
- gptq
- pruned
---

A Mistral-7B pruned50 with Marlin Kernel and AutoGPTQ

Please see my tutorial to execute this model: 

https://vilsonrodrigues.medium.com/sparse-quantize-and-serving-llms-with-neuralmagic-autogptq-and-vllm-03961b72ec3a