Spaces:
Paused
Paused
fix(float16): Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla T4 GPU has compute capability 7.5. You can use float16 instead by explicitly setting the`dtype` flag in CLI, for example: --dtype=half.
Browse files
runner.sh
CHANGED
@@ -59,7 +59,7 @@ python -u /app/openai_compatible_api_server.py \
|
|
59 |
--port 7860 \
|
60 |
--max-num-batched-tokens 32768 \
|
61 |
--max-model-len 32768 \
|
62 |
-
--dtype
|
63 |
--enforce-eager \
|
64 |
--gpu-memory-utilization 0.9 \
|
65 |
--enable-prefix-caching \
|
|
|
59 |
--port 7860 \
|
60 |
--max-num-batched-tokens 32768 \
|
61 |
--max-model-len 32768 \
|
62 |
+
--dtype float16 \
|
63 |
--enforce-eager \
|
64 |
--gpu-memory-utilization 0.9 \
|
65 |
--enable-prefix-caching \
|