Running on TGI or VLLM?

#6
by AmanRai7 - opened

Hi, is it possible to run this model using TGI or VLLM?

Could anyone share the resource tfor the same

Hugging Face H4 org

Hi @AmanRai7 since Zephyr is a Mistral fine-tune, you can run it on TGI easily with the Docker container as follows:

model=HuggingFaceH4/zephyr-7b-beta
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.1.1 --model-id $model

See the docs for more details: https://huggingface.co/docs/text-generation-inference/quicktour

Hi @lewtun , thanks for the reply, but I was reaslla searching for a option where in I can build it from the soucre for clouds like runpod. Nevertheless I was able to build and run successfully. Cheers!

AmanRai7 changed discussion status to closed

Sign up or log in to comment