GPU memory temporary buffer size too high.
#1
by
bhushangawde32
- opened
I checked multiple converted deepseek r1 distill qwen 1.5B models on MLCChat app on iPhone 15 Plus and Google Pixel 8 pro. But all of them have a very high GPU memory requirement due to which it fails on iOS and Android both.
Is there a way to make this run on smartphone?
FATAL EXCEPTION: Thread-4
Process: ai.mlc.mlcchat, PID: 14195
org.apache.tvm.Base$TVMError: TVMError: Check failed: (output_res.IsOk()) is false: Insufficient GPU memory error: The available single GPU memory is 4352.000 MB, which is less than the sum of model weight size (1059.693 MB) and temporary buffer size (11891.183 MB).