ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run `pip install flash_attn`

#76
by praveeny - opened

I was running Phi-2 on my CPU in a Jupyter notebook. When I just tried, it broke :-((

I see that the model has been updated. From the little research I did, apparently, flash_attn requires that I have Nvidia GPU? How do I run this on a CPU now? Or is that no longer an option?

P.S: - I am unable to install flash_attn, I have updated torch, transformers and packages and wheel. Now I see the following error when trying to install this package. I don't have CUDA.

raise OSError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

  torch.__version__  = 2.1.2+cpu

Facing the same issue, I'm trying to download model weights and build a docker image with vLLm . It gave the same error. It worked perfectly fine 6 hrs back, but with the the latest commit something seems broken.

In the meantime, how do we pull weights programatically from previous commit-id?

Microsoft org
β€’
edited Jan 12, 2024

Hello everyone!

We deployed a fix and it should be working now.

The issue was caused by the combination of using dynamic modules and remote code loading in transformers.

Regards,
Gustavo.

@gugarosa Hey just curious to understand the motivation behind renaming of layer_norm_epsilon to layer_norm_eps in the config.json?

I see vLLM use layer_norm_epsilon throughout all the models. So, now the recent commits in this repo is breaking things in vLLM
Screenshot 2024-01-12 at 11.54.05 AM.png

Microsoft org

I think we will need to update vLLM as well.

There is no reason in using layer_norm_eps. It was used in the first implementation of Phi (internally in transformers) and we followed it minimize friction when merging the integration.

Microsoft org

By the way, there is an active PR that will fix it: https://github.com/vllm-project/vllm/pull/2428/files

since the layernaming was changed for consistency reasons, don't you think it would be better to align with "layer_norm_epsilon" too ?
on the other hand llama uses "rms_norm_eps" .... go figure.

Microsoft org
β€’
edited Jan 12, 2024

I definitely agree!

Maybe an attribute_map: {"layer_norm_epsilon": "layer_norm_eps"} on the configuration_phi.py would fix the issue. And it would be an easier PR.

praveeny changed discussion status to closed

Sign up or log in to comment