OOM with vllm
#48
by
willowill5
- opened
OOM even on A100 80GB when deploying with
python -m vllm.entrypoints.api_server --model mistralai/Mixtral-8x7B-Instruct-v0.1 --dtype half
I have also tried flags "--max-model-len 8192" and "--gpu-memory-utilization 0.8 "
Anyone else run into this? Thanks!!
willowill5
changed discussion status to
closed