how to serve the model with parallelism

#13
by lone17 - opened

Hello again. I wonder what backend you use to serve this model. I've looked at vLLM, llama.cpp but they don't support tensor parallelism for GGUF at the moment.

Sign up or log in to comment