yusufs commited on
Commit
22ac900
·
1 Parent(s): 148829b

docs(add-comment): add comment

Browse files
Files changed (1) hide show
  1. runner.sh +10 -1
runner.sh CHANGED
@@ -43,8 +43,17 @@ printf "Running %s using vLLM OpenAI compatible API Server at port %s\n" $MODEL_
43
  # “Is…”
44
  # “Yes…!!!…?”
45
  # “Forty-two,” said Deep Thought, with infinite majesty and calm.”
46
-
47
  # ―Douglas Adams, The Hitchhiker’s Guide to the Galaxy
 
 
 
 
 
 
 
 
 
48
  # Run the Python script with the determined values
49
  # Supported tasks: {'generate', 'embedding'}
50
  python -u /app/openai_compatible_api_server.py \
 
43
  # “Is…”
44
  # “Yes…!!!…?”
45
  # “Forty-two,” said Deep Thought, with infinite majesty and calm.”
46
+ #
47
  # ―Douglas Adams, The Hitchhiker’s Guide to the Galaxy
48
+ #
49
+ #
50
+ # For sail/Sailor-4B-Chat if we only need 26576 token, it can be run using hardware with lower specs:
51
+ # Nvidia 1xL4 8 vCPU • 30 GB RAM • 24 GB VRAM (US$ 0.80/hour or per month assuming 720 hours is US$ 576)
52
+ # Larger token size requires more VRAM, for example 32768 token requires minimum:
53
+ # Nvidia 1xL40S 8 vCPU • 62 GB RAM • 48 GB VRAM (US$ 1.80/hour or per month assuming 720 hours is US$ 1.296)
54
+ #
55
+ # For meta-llama/Llama-3.2-3B-Instruct if we only need 32768 token, it can be run using hardware with lower specs:
56
+ # Nvidia T4 small 4 vCPU · 15 GB RAM · 16 GB VRAM (US$ 0.40/hour or per month assuming 720 hours is US$ 288)
57
  # Run the Python script with the determined values
58
  # Supported tasks: {'generate', 'embedding'}
59
  python -u /app/openai_compatible_api_server.py \