Commit 2a16d3d1ef9653c6e6fd68d935135cfb63eaa722 raises HTTPError when loading `nomic-ai/nomic-embed-text-v1` in `transformers`

#17
by GabrielFreeze-2 - opened

I was loading the model as per documented however after the change in the config.json as well as the deletion of modeling_hf_nomic_bert.py and configuration_hf_nomic_bert.py makes huggingface_hub enable to find the model.

Could not locate the nomic-ai/nomic-bert-2048--configuration_hf_nomic_bert.py inside nomic-ai/nomic-embed-text-v1.
Traceback (most recent call last):
  File "C:\Users\User\anaconda3\envs\web-ext\lib\site-packages\huggingface_hub\utils\_errors.py", line 304, in hf_raise_for_status
    response.raise_for_status()
  File "C:\Users\User\anaconda3\envs\web-ext\lib\site-packages\requests\models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/nomic-ai/nomic-embed-text-v1/resolve/main/nomic-ai/nomic-bert-2048--configuration_hf_nomic_bert.py

Hey sorry I was trying to consolidate all the code into a single place as we had 3-4 different versions! Does this code not work for you?

>>> from transformers import AutoModel
>>> model = AutoModel.from_pretrained("nomic-ai/nomic-embed-text-v1", trust_remote_code=True)
config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.03k/2.03k [00:00<00:00, 18.5MB/s]
configuration_hf_nomic_bert.py: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.96k/1.96k [00:00<00:00, 25.5MB/s]
A new version of the following files was downloaded from https://huggingface.co/nomic-ai/nomic-bert-2048:
- configuration_hf_nomic_bert.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
modeling_hf_nomic_bert.py: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 52.8k/52.8k [00:00<00:00, 79.4MB/s]
A new version of the following files was downloaded from https://huggingface.co/nomic-ai/nomic-bert-2048:
- modeling_hf_nomic_bert.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
pytorch_model.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 547M/547M [00:01<00:00, 446MB/s]
<All keys matched successfully>

I tried with transformers version 4.40, 4.37, 4.35. What version are you using?

zpn changed discussion status to closed
zpn changed discussion status to open

Ran your above code with both transformers==4.26.1 and transformers==4.40, however still the same.

Moreover this is what the .cache\huggingface\hub\models--nomic-ai--nomic-embed-text-v1\snapshots\{commit} looks like. Running on Windows 11.

image.png

Nomic AI org

Hm does it work if you clear the huggingface cache?

Clearing hugginface cache successfully redownloads and loads the model!

Python 3.9.19 (main, Mar 21 2024, 17:21:27) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from transformers import AutoModel
>>> model = AutoModel.from_pretrained("nomic-ai/nomic-embed-text-v1", trust_remote_code=True)
pytorch_model.bin: 100%|█████████████████████| 547M/547M [00:05<00:00, 102MB/s]
<All keys matched successfully>
>>> model.eval()
  (embeddings): NomicBertEmbeddings(
    (word_embeddings): Embedding(30528, 768)
    (token_type_embeddings): Embedding(2, 768)
  )
  (emb_drop): Dropout(p=0.0, inplace=False)
  (emb_ln): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
  (encoder): NomicBertEncoder(
    (layers): ModuleList(
      (0-11): 12 x NomicBertBlock(
        (attn): NomicBertAttention(
          (rotary_emb): NomicBertDynamicNTKRotaryEmbedding()
          (Wqkv): Linear(in_features=768, out_features=2304, bias=False)
          (out_proj): Linear(in_features=768, out_features=768, bias=False)
          (drop): Dropout(p=0.0, inplace=False)
        )
        (mlp): NomciBertGatedMLP(
          (fc11): Linear(in_features=768, out_features=3072, bias=False)
          (fc12): Linear(in_features=768, out_features=3072, bias=False)
          (fc2): Linear(in_features=3072, out_features=768, bias=False)
        )
        (dropout1): Dropout(p=0.0, inplace=False)
        (norm1): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (norm2): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (dropout2): Dropout(p=0.0, inplace=False)
      )
    )
  )
)
>>> print("Thanks")
Thanks
GabrielFreeze-2 changed discussion status to closed

Just to add some more information about this issue

Clearing huggingface cache and then reloading the model only worked for me with transformers==4.40.1. I have another dependency that needs transformers==4.26.1 and when trying to load with this version I get the Could not locate the nomic-ai/nomic-bert-2048--configuration_hf_nomic_bert.py inside nomic-ai/nomic-embed-text-v1. like issue https://huggingface.co/nomic-ai/nomic-embed-text-v1/discussions/18

@GabrielFreeze-2 Is there any way you can use 4.29.0 that was released last year? It seems like that was when the feature to reference other repos was introduced: https://github.com/huggingface/transformers/releases/tag/v4.29.0

Unfortunately not because I am also using salesforce-lavis==1.0.2 which requires transformers<4.27,>=4.25.0. However I found a work-around by creating two python environments and running the scripts from their respective environment.

Downloading transformers-4.29.0-py3-none-any.whl (7.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.1/7.1 MB 18.8 MB/s eta 0:00:00
Installing collected packages: transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 4.26.1
    Uninstalling transformers-4.26.1:
      Successfully uninstalled transformers-4.26.1
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
salesforce-lavis 1.0.2 requires transformers<4.27,>=4.25.0, but you have transformers 4.29.0 which is incompatible.
Successfully installed transformers-4.29.0
Nomic AI org

agh i'm sorry that's quite annoying, let me try and think of a better workaround. I'd prefer not to have three files of the same model since they get out of sync and are hard to track hence why I made the switch over

Sign up or log in to comment