Warning when loading the model with HuggingFace Transformers

#7
by yaya-sy - opened

Hi,

When I try to load the model with: "AutoModel.from_pretrained('utter-project/mHuBERT-147')", I have this warning:

Some weights of the model checkpoint at mHUBERT were not used when initializing HubertModel: ['encoder.pos_conv_embed.conv.weight_g', 'encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing HubertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing HubertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of HubertModel were not initialized from the model checkpoint at mHUBERT and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Is this expected? I want to use the model to embed audio datasets.

Thanks!

I don't know about your error, but when I load the model with the repo weights (checkpoint_best.pt), it works perfectly.
While you're waiting for the authors to get back to you, you can give this model a try.

UTTER - Unified Transcription and Translation for Extended Reality org

Hello,

What is the class of "AutoModel" you are using?

UTTER - Unified Transcription and Translation for Extended Reality org

Hi again! I think I might have just reproduced your issue on a HF spaces docker by accident.
It seems conv.weight_g and conv.weight_v where replaced by conv.parametrizations.weight.original0 and conv.parametrizations.weight.original1 on a newer transformers version.

I manage to get it to correctly load the checkpoint by using the following environment:
numpy==1.26.3
torch==1.13.1
transformers==4.32.0

Hi @mzboito ,

Thank you very much for your time! I will try this environment.

mzboito changed discussion status to closed
mzboito changed discussion status to open
UTTER - Unified Transcription and Translation for Extended Reality org

Posting this here in case someone else experiences this problem with mHuBERT-147:
https://github.com/huggingface/transformers/issues/26796

Crazily enough, this is a fake warning! The weights are correctly loaded on torch>=2!

how to use this model?

UTTER - Unified Transcription and Translation for Extended Reality org

Hi @MonolithFoundation ,

mHuBERT-147 is a multilingual speech representation model.
Here there is some more information on this model: https://huggingface.co/blog/mzboito/naver-demo-french-slu#mhubert-147-a-compact-multilingual-hubert-model
You can also check the ASR fine-tuning tutorial below.

To simply load the model as it is (not an ASR module, just a speech representation model), use the following:

from transformers import HubertModel
HubertModel.from_pretrained("utter-project/mHuBERT-147")

i want compare the audio and the text inside it without doing asr, is that possible? (same thing as compare an image and text)

UTTER - Unified Transcription and Translation for Extended Reality org

It is not possible. mHuBERT-147 is a speech-only encoder, you need something else for text

Sign up or log in to comment