Running on TGI or VLLM?

by AmanRai7 - opened Oct 29, 2023

Discussion

AmanRai7

Oct 29, 2023

Hi, is it possible to run this model using TGI or VLLM?

Could anyone share the resource tfor the same

lewtun

Hugging Face H4 org Oct 31, 2023

Hi @AmanRai7 since Zephyr is a Mistral fine-tune, you can run it on TGI easily with the Docker container as follows:

model=HuggingFaceH4/zephyr-7b-beta
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.1.1 --model-id $model

See the docs for more details: https://huggingface.co/docs/text-generation-inference/quicktour

AmanRai7

Nov 1, 2023

Hi @lewtun , thanks for the reply, but I was reaslla searching for a option where in I can build it from the soucre for clouds like runpod. Nevertheless I was able to build and run successfully. Cheers!

AmanRai7 changed discussion status to closed Nov 1, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment