vumichien
/

AV-HuBERT

Automatic Speech Recognition

Audio Visual to Text

Automatic Speech Recognition

Model card Files Files and versions Community

vumichien commited on Jan 16, 2023

Commit

3d63ba1

•

1 Parent(s): 4cb31e8

Update README.md

Files changed (1) hide show

README.md +6 -0

README.md CHANGED Viewed

@@ -15,6 +15,12 @@ tags:
 These are model weights originally provided by the authors of the paper [Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction](https://arxiv.org/pdf/2201.02184.pdf).
 Video recordings of speech contain correlated audio and visual information, providing a strong signal for speech representation learning from the speaker’s lip
 movements and the produced sound.

 These are model weights originally provided by the authors of the paper [Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction](https://arxiv.org/pdf/2201.02184.pdf).
+<figure>
+  <img src="https://huggingface.co/vumichien/AV-HuBERT/blob/main/HuBert.png" alt="Audio-visual HuBERT">
+  <figcaption>Audio-visual HuBERT
+  </figcaption>
+</figure>
 Video recordings of speech contain correlated audio and visual information, providing a strong signal for speech representation learning from the speaker’s lip
 movements and the produced sound.