You can deploy it using vllm. And this is the script for deploying. ``` bash python -O -u -m vllm.entrypoints.openai.api_server \ --host=127.0.0.1 \ --port=8090 \ --model=Melon/Meta-Llama-3-70B-Instruct-AutoAWQ-4bit \ --tokenizer=meta-llama/Meta-Llama-3-70B-Instruct \ --tensor-parallel-size=1 \ --quantization awq \ --dtype half ```