Spaces:
Paused
Paused
docs(add-comment): add comment
Browse files
runner.sh
CHANGED
@@ -43,8 +43,17 @@ printf "Running %s using vLLM OpenAI compatible API Server at port %s\n" $MODEL_
|
|
43 |
# “Is…”
|
44 |
# “Yes…!!!…?”
|
45 |
# “Forty-two,” said Deep Thought, with infinite majesty and calm.”
|
46 |
-
|
47 |
# ―Douglas Adams, The Hitchhiker’s Guide to the Galaxy
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
48 |
# Run the Python script with the determined values
|
49 |
# Supported tasks: {'generate', 'embedding'}
|
50 |
python -u /app/openai_compatible_api_server.py \
|
|
|
43 |
# “Is…”
|
44 |
# “Yes…!!!…?”
|
45 |
# “Forty-two,” said Deep Thought, with infinite majesty and calm.”
|
46 |
+
#
|
47 |
# ―Douglas Adams, The Hitchhiker’s Guide to the Galaxy
|
48 |
+
#
|
49 |
+
#
|
50 |
+
# For sail/Sailor-4B-Chat if we only need 26576 token, it can be run using hardware with lower specs:
|
51 |
+
# Nvidia 1xL4 8 vCPU • 30 GB RAM • 24 GB VRAM (US$ 0.80/hour or per month assuming 720 hours is US$ 576)
|
52 |
+
# Larger token size requires more VRAM, for example 32768 token requires minimum:
|
53 |
+
# Nvidia 1xL40S 8 vCPU • 62 GB RAM • 48 GB VRAM (US$ 1.80/hour or per month assuming 720 hours is US$ 1.296)
|
54 |
+
#
|
55 |
+
# For meta-llama/Llama-3.2-3B-Instruct if we only need 32768 token, it can be run using hardware with lower specs:
|
56 |
+
# Nvidia T4 small 4 vCPU · 15 GB RAM · 16 GB VRAM (US$ 0.40/hour or per month assuming 720 hours is US$ 288)
|
57 |
# Run the Python script with the determined values
|
58 |
# Supported tasks: {'generate', 'embedding'}
|
59 |
python -u /app/openai_compatible_api_server.py \
|