Spaces:

yusufs
/

sailor2-3b-chat

Paused

yusufs commited on Mar 20

Commit

22ac900

1 Parent(s): 148829b

docs(add-comment): add comment

Files changed (1) hide show

runner.sh CHANGED Viewed

@@ -43,8 +43,17 @@ printf "Running %s using vLLM OpenAI compatible API Server at port %s\n" $MODEL_
 # “Is…”
 # “Yes…!!!…?”
 # “Forty-two,” said Deep Thought, with infinite majesty and calm.”
 # ―Douglas Adams, The Hitchhiker’s Guide to the Galaxy
 # Run the Python script with the determined values
 # Supported tasks: {'generate', 'embedding'}
 python -u /app/openai_compatible_api_server.py \

 # “Is…”
 # “Yes…!!!…?”
 # “Forty-two,” said Deep Thought, with infinite majesty and calm.”
+#
 # ―Douglas Adams, The Hitchhiker’s Guide to the Galaxy
+#
+#
+# For sail/Sailor-4B-Chat if we only need 26576 token, it can be run using hardware with lower specs:
+#   Nvidia 1xL4 8 vCPU • 30 GB RAM • 24 GB VRAM (US$ 0.80/hour or per month assuming 720 hours is US$ 576)
+# Larger token size requires more VRAM, for example 32768 token requires minimum:
+#   Nvidia 1xL40S 8 vCPU • 62 GB RAM • 48 GB VRAM (US$ 1.80/hour or per month assuming 720 hours is US$ 1.296)
+#
+# For meta-llama/Llama-3.2-3B-Instruct if we only need 32768 token, it can be run using hardware with lower specs:
+#   Nvidia T4 small 4 vCPU · 15 GB RAM · 16 GB VRAM (US$ 0.40/hour or per month assuming 720 hours is US$ 288)
 # Run the Python script with the determined values
 # Supported tasks: {'generate', 'embedding'}
 python -u /app/openai_compatible_api_server.py \