vumichien
/

AV-HuBERT

Automatic Speech Recognition

Audio Visual to Text

Automatic Speech Recognition

Model card Files Files and versions Community

vumichien commited on Jan 17, 2023

Commit

1ae7cce

·

1 Parent(s): 23fc185

Update README.md

Files changed (1) hide show

README.md +8 -0

README.md CHANGED Viewed

@@ -27,5 +27,13 @@ movements and the produced sound.
 Audio-Visual Hidden Unit BERT (AV-HuBERT), a self-supervised representation learning framework for audio-visual speech, which masks multi-stream video input and predicts automatically discovered and iteratively refined multimodal hidden units. AV-HuBERT
 learns powerful audio-visual speech representation benefiting both lip-reading and automatic speech recognition.
 ## Datasets
 The authors trained the model on lip-reading benchmark LRS3 datasets (433 hours).

 Audio-Visual Hidden Unit BERT (AV-HuBERT), a self-supervised representation learning framework for audio-visual speech, which masks multi-stream video input and predicts automatically discovered and iteratively refined multimodal hidden units. AV-HuBERT
 learns powerful audio-visual speech representation benefiting both lip-reading and automatic speech recognition.
+## Example
+<figure>
+  <img src="https://huggingface.co/vumichien/AV-HuBERT/resolve/main/lipreading.gif" alt="Audio-Visual Speech Recognition">
+  <figcaption> Speech Recognition from Lip video
+  </figcaption>
+</figure>
 ## Datasets
 The authors trained the model on lip-reading benchmark LRS3 datasets (433 hours).