Spaces:

yusufs
/

vllm-inference

Paused

File size: 1,353 Bytes

1a7087e
ae7cfbb
 
1a7087e
ae7cfbb
1a7087e
 
 
7935381
 
1a7087e
ae7cfbb
 
 
 
 
7935381

---
title: Deploy VLLM
emoji: 🐢
colorFrom: blue
colorTo: blue
sdk: docker
pinned: false
---


Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference


```shell
poetry export -f requirements.txt --output requirements.txt --without-hashes
```


## VLLM OpenAI Compatible API Server

> References: https://huggingface.co/spaces/sofianhw/ai/tree/c6527a750644a849b6705bb6fe2fcea4e54a8196

This `api_server.py` file is exact copy version from https://github.com/vllm-project/vllm/blob/v0.6.4.post1/vllm/entrypoints/openai/api_server.py

* The `HUGGING_FACE_HUB_TOKEN` must exist during runtime.

## Documentation about config

* https://github.com/vllm-project/vllm/blob/v0.6.4.post1/vllm/utils.py#L1207-L1221

```shell
"serve,chat,complete",
"facebook/opt-12B",
'--config', 'config.yaml',
'-tp', '2'
```

The yaml is equivalent with argument flag params. Consider passing using flag params that defined here for better documentation:
https://github.com/vllm-project/vllm/blob/v0.6.4.post1/vllm/entrypoints/openai/cli_args.py#L77-L237

Other arguments is the same as LLM class such as `--max-model-len`, `--dtype`, or `--otlp-traces-endpoint`
* https://github.com/vllm-project/vllm/blob/v0.6.4/vllm/config.py#L1061-L1086
* https://github.com/vllm-project/vllm/blob/v0.6.4.post1/vllm/engine/arg_utils.py#L221-L913