Some bug when using function call with vllm==0.8.4

#4
by waple - opened

I simply transferred the function call demo from Transformers to Vllm and found:

RuntimeError: Failed running call_function (*((FakeTensor(..., device='cuda:0', size=(s0, 4096), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(s0, 4096), dtype=torch.bfloat16)), Parameter(FakeTensor(..., device='cuda:0', size=(13696, 4096), dtype=torch.bfloat16)), None), **{}):

When I downgraded vllm to 0.8.3, it ran successfully.

Z.ai & THUKEG org

https://github.com/vllm-project/vllm/pull/16618 Fix the inference problem of GLM-4-0414

This comment has been hidden (marked as Spam)
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment