OOM with vllm

#48

by willowill5 - opened Dec 17, 2023

Dec 17, 2023

OOM even on A100 80GB when deploying with

python -m vllm.entrypoints.api_server --model mistralai/Mixtral-8x7B-Instruct-v0.1 --dtype half

I have also tried flags "--max-model-len 8192" and "--gpu-memory-utilization 0.8 "

Anyone else run into this? Thanks!!

willowill5 changed discussion status to closed Dec 18, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment