Flash Attention 2 for Qwen2_5OmniToken2WavModel

#23
by fduches2 - opened

Is Flash Attention 2 expected to be supported for the Qwen2_5OmniToken2WavModel? Now, as I see it, it only works in fp32.

Yes flash attention is a problem for everyone !

The mistral model also used flash attention but still allowed for other methods such as 4 but loading etc . But this model does have other options but it will not load with eager or spd.. etc ?

I think this does need to be rectified ! As it is very important for loading the model !

Flash attention is a known problem ! For installation and compatibility issues as the flash attention crew are not creating a universal product which can be loaded on all systems. It seems as if it is locked to Linux !! Same a Triton was but Triton runs on windows now !

Perhaps the unsloth loader will solve this on the future , as with other models they run better with the unsloth

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment