Update README.md
Browse files
README.md
CHANGED
@@ -27,5 +27,13 @@ movements and the produced sound.
|
|
27 |
Audio-Visual Hidden Unit BERT (AV-HuBERT), a self-supervised representation learning framework for audio-visual speech, which masks multi-stream video input and predicts automatically discovered and iteratively refined multimodal hidden units. AV-HuBERT
|
28 |
learns powerful audio-visual speech representation benefiting both lip-reading and automatic speech recognition.
|
29 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
## Datasets
|
31 |
The authors trained the model on lip-reading benchmark LRS3 datasets (433 hours).
|
|
|
27 |
Audio-Visual Hidden Unit BERT (AV-HuBERT), a self-supervised representation learning framework for audio-visual speech, which masks multi-stream video input and predicts automatically discovered and iteratively refined multimodal hidden units. AV-HuBERT
|
28 |
learns powerful audio-visual speech representation benefiting both lip-reading and automatic speech recognition.
|
29 |
|
30 |
+
## Example
|
31 |
+
|
32 |
+
<figure>
|
33 |
+
<img src="https://huggingface.co/vumichien/AV-HuBERT/resolve/main/lipreading.gif" alt="Audio-Visual Speech Recognition">
|
34 |
+
<figcaption> Speech Recognition from Lip video
|
35 |
+
</figcaption>
|
36 |
+
</figure>
|
37 |
+
|
38 |
## Datasets
|
39 |
The authors trained the model on lip-reading benchmark LRS3 datasets (433 hours).
|