Inference problems for all Qwen2.5 VL models in transformers above 4.49.0

#26
by mirekphd - opened

I've been having various inference problems (such as value errors or garbled output) with AWQ quantizations of Qwen2.5 VL models of various sizes (from 3B to 72B) since transformers package got upgraded to v4.50.0 and these problems persist in the latest versions as well (4.50.2 and 4.50.3). They occur regardless of whether Flash Attention is used or not. They should be easy to reproduce, as long as you use VL models (of any size) and AWQ quant. (standard Qwen2.4 LLMs in AWQ quant. are OK, i.e. compatible with the latest transformers versions).

Note: I moved the issue to GitHub here: https://github.com/QwenLM/Qwen2.5-VL/issues/1033
(apologies for double-posting)

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment