patrickvonplaten
commited on
Commit
•
2493a2c
1
Parent(s):
e5b789b
Update README.md
Browse files
README.md
CHANGED
@@ -21,7 +21,9 @@ Speech datasets from multiple domains were used to pretrain the model:
|
|
21 |
- [Switchboard](https://catalog.ldc.upenn.edu/LDC97S62): telephone speech corpus; noisy telephone data
|
22 |
- [Fisher](https://catalog.ldc.upenn.edu/LDC2004T19): conversational telephone speech; noisy telephone data
|
23 |
|
24 |
-
When using the model make sure that your speech input is also sampled at 16Khz.
|
|
|
|
|
25 |
|
26 |
[Paper Robust Wav2Vec2](https://arxiv.org/abs/2104.01027)
|
27 |
|
|
|
21 |
- [Switchboard](https://catalog.ldc.upenn.edu/LDC97S62): telephone speech corpus; noisy telephone data
|
22 |
- [Fisher](https://catalog.ldc.upenn.edu/LDC2004T19): conversational telephone speech; noisy telephone data
|
23 |
|
24 |
+
When using the model make sure that your speech input is also sampled at 16Khz.
|
25 |
+
|
26 |
+
**Note**: This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model **speech recognition**, a tokenizer should be created and the model should be fine-tuned on labeled text data. Check out [this blog](https://huggingface.co/blog/fine-tune-wav2vec2-english) for more in-detail explanation of how to fine-tune the model.
|
27 |
|
28 |
[Paper Robust Wav2Vec2](https://arxiv.org/abs/2104.01027)
|
29 |
|