Warning when loading the model with HuggingFace Transformers

by yaya-sy - opened Aug 5, 2024

Aug 5, 2024

Hi,

When I try to load the model with: "AutoModel.from_pretrained('utter-project/mHuBERT-147')", I have this warning:

Some weights of the model checkpoint at mHUBERT were not used when initializing HubertModel: ['encoder.pos_conv_embed.conv.weight_g', 'encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing HubertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing HubertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of HubertModel were not initialized from the model checkpoint at mHUBERT and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Is this expected? I want to use the model to embed audio datasets.

Thanks!

jacquelm

Aug 6, 2024

I don't know about your error, but when I load the model with the repo weights (checkpoint_best.pt), it works perfectly.
While you're waiting for the authors to get back to you, you can give this model a try.

mzboito

UTTER - Unified Transcription and Translation for Extended Reality org Aug 19, 2024

Hello,

What is the class of "AutoModel" you are using?

mzboito

UTTER - Unified Transcription and Translation for Extended Reality org Aug 21, 2024

Hi again! I think I might have just reproduced your issue on a HF spaces docker by accident.
It seems conv.weight_g and conv.weight_v where replaced by conv.parametrizations.weight.original0 and conv.parametrizations.weight.original1 on a newer transformers version.

I manage to get it to correctly load the checkpoint by using the following environment:
numpy==1.26.3
torch==1.13.1
transformers==4.32.0

yaya-sy

Aug 21, 2024

Hi @mzboito ,

Thank you very much for your time! I will try this environment.

mzboito changed discussion status to closed Aug 27, 2024

mzboito changed discussion status to open Sep 13, 2024

mzboito

UTTER - Unified Transcription and Translation for Extended Reality org Sep 13, 2024

Posting this here in case someone else experiences this problem with mHuBERT-147:
https://github.com/huggingface/transformers/issues/26796

Crazily enough, this is a fake warning! The weights are correctly loaded on torch>=2!

MonolithFoundation

Dec 12, 2024

how to use this model?

mzboito

UTTER - Unified Transcription and Translation for Extended Reality org Dec 12, 2024

Hi @MonolithFoundation ,

mHuBERT-147 is a multilingual speech representation model.
Here there is some more information on this model: https://huggingface.co/blog/mzboito/naver-demo-french-slu#mhubert-147-a-compact-multilingual-hubert-model
You can also check the ASR fine-tuning tutorial below.

To simply load the model as it is (not an ASR module, just a speech representation model), use the following:

from transformers import HubertModel
HubertModel.from_pretrained("utter-project/mHuBERT-147")

MonolithFoundation

Dec 12, 2024

i want compare the audio and the text inside it without doing asr, is that possible? (same thing as compare an image and text)

mzboito

UTTER - Unified Transcription and Translation for Extended Reality org Dec 12, 2024

It is not possible. mHuBERT-147 is a speech-only encoder, you need something else for text

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment