sailor2-3b-chat / README.md
yusufs's picture
feat(endpoint): add prefix /api on each endpoint
5f3bf21
|
raw
history blame
1.59 kB
metadata
title: Deploy VLLM
emoji: 🐢
colorFrom: blue
colorTo: blue
sdk: docker
pinned: false

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

poetry export -f requirements.txt --output requirements.txt --without-hashes

VLLM OpenAI Compatible API Server

References: https://huggingface.co/spaces/sofianhw/ai/tree/c6527a750644a849b6705bb6fe2fcea4e54a8196

Fixes:

This api_server.py file is exact copy version from https://github.com/vllm-project/vllm/blob/v0.6.4.post1/vllm/entrypoints/openai/api_server.py

  • The HUGGING_FACE_HUB_TOKEN must exist during runtime.

Documentation about config

"serve,chat,complete",
"facebook/opt-12B",
'--config', 'config.yaml',
'-tp', '2'

The yaml is equivalent with argument flag params. Consider passing using flag params that defined here for better documentation: https://github.com/vllm-project/vllm/blob/v0.6.4.post1/vllm/entrypoints/openai/cli_args.py#L77-L237

Other arguments is the same as LLM class such as --max-model-len, --dtype, or --otlp-traces-endpoint