# VLLM OpenAI Compatible API Server > References: https://huggingface.co/spaces/sofianhw/ai/tree/c6527a750644a849b6705bb6fe2fcea4e54a8196 This `api_server.py` file is exact copy version from https://github.com/vllm-project/vllm/blob/v0.6.4.post1/vllm/entrypoints/openai/api_server.py * The `HUGGING_FACE_HUB_TOKEN` must exist during runtime. ## Documentation about config * https://github.com/vllm-project/vllm/blob/v0.6.4.post1/vllm/utils.py#L1207-L1221 ```shell "serve,chat,complete", "facebook/opt-12B", '--config', 'config.yaml', '-tp', '2' ``` The yaml is equivalent with argument flag params. Consider passing using flag params that defined here for better documentation: https://github.com/vllm-project/vllm/blob/v0.6.4.post1/vllm/entrypoints/openai/cli_args.py#L77-L237 Other arguments is the same as LLM class such as `--max-model-len`, `--dtype`, or `--otlp-traces-endpoint` * https://github.com/vllm-project/vllm/blob/v0.6.4/vllm/config.py#L1061-L1086 * https://github.com/vllm-project/vllm/blob/v0.6.4.post1/vllm/engine/arg_utils.py#L221-L913