Spaces:

yusufs
/

sailor2-3b-chat

Paused

yusufs commited on Apr 16

Commit

78963b9

1 Parent(s): 8132d1f

fix(float16): Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla T4 GPU has compute capability 7.5. You can use float16 instead by explicitly setting the`dtype` flag in CLI, for example: --dtype=half.

Files changed (1) hide show

runner.sh CHANGED Viewed

@@ -59,7 +59,7 @@ python -u /app/openai_compatible_api_server.py \
     --port 7860 \
     --max-num-batched-tokens 32768 \
     --max-model-len 32768 \
-    --dtype bfloat16 \
     --enforce-eager \
     --gpu-memory-utilization 0.9 \
     --enable-prefix-caching \

     --port 7860 \
     --max-num-batched-tokens 32768 \
     --max-model-len 32768 \
+    --dtype float16 \
     --enforce-eager \
     --gpu-memory-utilization 0.9 \
     --enable-prefix-caching \