Could not locate the configuration_hf_nomic_bert.py inside nomic-ai/nomic-bert-2048. Error

#25
by Usersh - opened

I downloaded model manually from files and versions tab, then packaged into a folder that I scp to linux box. The remote server on linux box is refusing any connection so I can't download model from url or use from_pretrained method to download. So this "solution" won't work for me, https://huggingface.co/nomic-ai/nomic-embed-text-v1/discussions/17#6633bb7e7881bff9ddfca8d6.
configuration_hf_nomic_bert.py is in nomic-bert-2048 directory, which is in model folder along with nomic-embed-text-v.15 model folder.
I'm not sure why it can detect nomic-bert-2048 folder, which I didn't define path to, but not the configuration_hf_nomic_bert.py in this folder.
I tried adding nomic-bert-2048 model folder path with sys.path, tried using os.path.abspath and os.path.join, but none of those seems to solve this error: "Could not locate the configuration_hf_nomic_bert.py inside nomic-ai/nomic-bert-2048."
What is causing this? How did it successfully search for nomic-bert-2048 folder when I didn't specify it's directory?
Does anyone know how to resolve this error?
Should I put nomic-bert-2048 folder in the nomic-embed-text-v1.5 model folder?

following this thread

Nomic AI org

What transformers version are you running?

I'm getting this error with transformers 4.44.0

I used other embedded model and it worked fine. Looking like this only happens with nomic model.

Nomic AI org

I'm unable to reproduce your errors. Can you please post the error, python version, and environment so I can try to reproduce?

This was from while back but my enviromnet is RHEL 8, Python 3.11.5, transformer 4.41.2.

Error message was fairly simple:
"Could not locate the configuration_hf_nomic_bert.py inside nomic-ai/nomic-bert-2048."

I think it is related to how my environments are set-up and/or potentially stored in a HuggingFace cache.
If you couldn't replicate the error, I'm assuming the issue is probably caused by proxy server we use.

While I was trying to resolve this few months ago, I found out the issue might be that nomic-ai requires a BERT backend. I thought I could bypass it by installing both nomic-embed-text-v1.5 and nomic-bert-2048 and packaging them together (putting nomic-bert-2048 in nomic-embed-text folder), but that didn't help.

For some reason, I assume proxy error, python can detect nomic-bert-2048 folder, which I didn't define path to, but not the configuration_hf_nomic_bert.py located in this folder. (See 2nd image)
nomic-bert-2048 folder is located in nomic-ai folder and its path is defined in my script, but python still can't detect/locate configuration_hf_nomic_bert.py.

The model is loaded from local storage through AutoTokenizer and AutoModel.
tokenizer = AutoTokenizer.from_pretrained("path_to_your_local_folder")
model = AutoModel.from_pretrained("path_to_your_local_folder")

Untitled.png
image.png

In case it helps, I faced a similar issue where the model serving system we use at my job doesn't allow downloading files. I log the model with mlflow to deploy it, mlflow invokes save() on the sentence transformer model, but these two files don't get included in the local directory due to the indirection in Nomic's config.json. I worked around it like this:

from pathlib import Path
import tempfile

import mlflow.sentence_transformers
from huggingface_hub import hf_hub_download
from sentence_transformers import SentenceTransformer

original_model = SentenceTransformer(
    "nomic-ai/nomic-embed-text-v1.5",
    trust_remote_code=True
)

with tempfile.TemporaryDirectory() as model_path:
    original_model.save(model_path)

    # NOTE: big hack here. The nomic-ai/nomic-embed-text-v1.5 model
    # on Hugging Face Hub references files stored under another
    # model in their account. Because of this, even though mlflow
    # invokes the save() method on the sentence transformer model,
    # which should save an offline copy of model data + code,
    # these files are not saved locally and still must be downloaded
    # later, which doesn't work for us. So, we download them
    # manually and edit the config.json to reference the local
    # copies. This could easily break if Nomic changes their
    # code organization on HF Hub.
    hf_hub_download(
        repo_id="nomic-ai/nomic-bert-2048",
        filename="configuration_hf_nomic_bert.py",
        local_dir=model_path
    )
    hf_hub_download(
        repo_id="nomic-ai/nomic-bert-2048",
        filename="modeling_hf_nomic_bert.py",
        local_dir=model_path
    )

    config_path = Path(model_path) / "config.json"
    config = open(config_path).read()
    # remove prefix so that transformers will look for these
    # files in the local model directory
    config = config.replace("nomic-ai/nomic-bert-2048--", "")
    open(config_path, "w").write(config)

    # model_path is now a fully self-contained version of the Nomic model.
Nomic AI org

Thanks @christopherfox !

Sign up or log in to comment