yusufs commited on
Commit
78963b9
·
1 Parent(s): 8132d1f

fix(float16): Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla T4 GPU has compute capability 7.5. You can use float16 instead by explicitly setting the`dtype` flag in CLI, for example: --dtype=half.

Browse files
Files changed (1) hide show
  1. runner.sh +1 -1
runner.sh CHANGED
@@ -59,7 +59,7 @@ python -u /app/openai_compatible_api_server.py \
59
  --port 7860 \
60
  --max-num-batched-tokens 32768 \
61
  --max-model-len 32768 \
62
- --dtype bfloat16 \
63
  --enforce-eager \
64
  --gpu-memory-utilization 0.9 \
65
  --enable-prefix-caching \
 
59
  --port 7860 \
60
  --max-num-batched-tokens 32768 \
61
  --max-model-len 32768 \
62
+ --dtype float16 \
63
  --enforce-eager \
64
  --gpu-memory-utilization 0.9 \
65
  --enable-prefix-caching \