ivangtorre
commited on
Commit
•
d62e767
1
Parent(s):
0299e04
Update README.md
Browse files
README.md
CHANGED
@@ -4,7 +4,6 @@ language:
|
|
4 |
- qu
|
5 |
metrics:
|
6 |
- cer
|
7 |
-
- wer
|
8 |
pipeline_tag: automatic-speech-recognition
|
9 |
datasets:
|
10 |
- ivangtorre/second_americas_nlp_2022
|
@@ -27,13 +26,15 @@ model-index:
|
|
27 |
metrics:
|
28 |
- name: Test CER
|
29 |
type: cer
|
30 |
-
value:
|
31 |
-
|
32 |
-
type: wer
|
33 |
-
value: 11.11
|
34 |
---
|
35 |
|
36 |
-
|
|
|
|
|
|
|
|
|
37 |
|
38 |
The model can be used directly (without a language model) as follows:
|
39 |
|
@@ -46,11 +47,16 @@ import torchaudio
|
|
46 |
processor = Wav2Vec2Processor.from_pretrained("ivangtorre/wav2vec2-xlsr-300m-quechua")
|
47 |
model = Wav2Vec2ForCTC.from_pretrained("ivangtorre/wav2vec2-xlsr-300m-quechua")
|
48 |
|
49 |
-
#
|
50 |
-
|
51 |
|
52 |
-
#
|
53 |
-
|
|
|
|
|
|
|
|
|
|
|
54 |
|
55 |
# take argmax and decode
|
56 |
predicted_ids = torch.argmax(logits, dim=-1)
|
@@ -87,8 +93,6 @@ def map_to_pred(batch):
|
|
87 |
result = librispeech_eval.map(map_to_pred, batched=True, batch_size=1)
|
88 |
|
89 |
print("CER:", cer(result["source_processed"], result["transcription"]))
|
90 |
-
print("WER:", cer(result["source_processed"], result["transcription"]))
|
91 |
-
|
92 |
```
|
93 |
|
94 |
## Citation
|
|
|
4 |
- qu
|
5 |
metrics:
|
6 |
- cer
|
|
|
7 |
pipeline_tag: automatic-speech-recognition
|
8 |
datasets:
|
9 |
- ivangtorre/second_americas_nlp_2022
|
|
|
26 |
metrics:
|
27 |
- name: Test CER
|
28 |
type: cer
|
29 |
+
value: 16.02
|
30 |
+
|
|
|
|
|
31 |
---
|
32 |
|
33 |
+
This model was finetuned from a Wav2vec2.0 XLS-R model: 300M with the Quechua train parition of the Americas NLP 2022 dataset. This challenge took place during NeurIPSS 2022.
|
34 |
+
|
35 |
+
|
36 |
+
|
37 |
+
## Example of usage
|
38 |
|
39 |
The model can be used directly (without a language model) as follows:
|
40 |
|
|
|
47 |
processor = Wav2Vec2Processor.from_pretrained("ivangtorre/wav2vec2-xlsr-300m-quechua")
|
48 |
model = Wav2Vec2ForCTC.from_pretrained("ivangtorre/wav2vec2-xlsr-300m-quechua")
|
49 |
|
50 |
+
# Pat to wav file
|
51 |
+
pathfile = "/path/to/wavfile"
|
52 |
|
53 |
+
# Load and normalize the file
|
54 |
+
wav, curr_sample_rate = sf.read(pathfile, dtype="float32")
|
55 |
+
feats = torch.from_numpy(wav).float()
|
56 |
+
with torch.no_grad():
|
57 |
+
feats = F.layer_norm(feats, feats.shape)
|
58 |
+
feats = torch.unsqueeze(feats, 0)
|
59 |
+
logits = model(feats).logits
|
60 |
|
61 |
# take argmax and decode
|
62 |
predicted_ids = torch.argmax(logits, dim=-1)
|
|
|
93 |
result = librispeech_eval.map(map_to_pred, batched=True, batch_size=1)
|
94 |
|
95 |
print("CER:", cer(result["source_processed"], result["transcription"]))
|
|
|
|
|
96 |
```
|
97 |
|
98 |
## Citation
|