GPU memory temporary buffer size too high.

#1
by bhushangawde32 - opened

I checked multiple converted deepseek r1 distill qwen 1.5B models on MLCChat app on iPhone 15 Plus and Google Pixel 8 pro. But all of them have a very high GPU memory requirement due to which it fails on iOS and Android both.
Is there a way to make this run on smartphone?

FATAL EXCEPTION: Thread-4
Process: ai.mlc.mlcchat, PID: 14195
org.apache.tvm.Base$TVMError: TVMError: Check failed: (output_res.IsOk()) is false: Insufficient GPU memory error: The available single GPU memory is 4352.000 MB, which is less than the sum of model weight size (1059.693 MB) and temporary buffer size (11891.183 MB).

Sign up or log in to comment