Running on TGI or VLLM?
#6
by
AmanRai7
- opened
Hi, is it possible to run this model using TGI or VLLM?
Could anyone share the resource tfor the same
Hi @AmanRai7 since Zephyr is a Mistral fine-tune, you can run it on TGI easily with the Docker container as follows:
model=HuggingFaceH4/zephyr-7b-beta
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.1.1 --model-id $model
See the docs for more details: https://huggingface.co/docs/text-generation-inference/quicktour
AmanRai7
changed discussion status to
closed