ivangtorre
/

wav2vec2-xlsr-300m-quechua

Automatic Speech Recognition

xlsr-fine-tuning

Inference Endpoints

Model card Files Files and versions Community

ivangtorre commited on May 14

Commit

d62e767

•

1 Parent(s): 0299e04

Update README.md

Files changed (1) hide show

README.md +16 -12

README.md CHANGED Viewed

@@ -4,7 +4,6 @@ language:
 - qu
 metrics:
 - cer
-- wer
 pipeline_tag: automatic-speech-recognition
 datasets:
 - ivangtorre/second_americas_nlp_2022
@@ -27,13 +26,15 @@ model-index:
     metrics:
     - name: Test CER
       type: cer
-      value: 11.11
-    - name: Test WER
-      type: wer
-      value: 11.11
 ---
-## Usage
 The model can be used directly (without a language model) as follows:
@@ -46,11 +47,16 @@ import torchaudio
 processor = Wav2Vec2Processor.from_pretrained("ivangtorre/wav2vec2-xlsr-300m-quechua")
 model = Wav2Vec2ForCTC.from_pretrained("ivangtorre/wav2vec2-xlsr-300m-quechua")
-# load dummy dataset and read soundfiles
-file = torchaudio.load("quechua000573.wav")
-# retrieve logits
-logits = model(file[0]).logits
 # take argmax and decode
 predicted_ids = torch.argmax(logits, dim=-1)
@@ -87,8 +93,6 @@ def map_to_pred(batch):
 result = librispeech_eval.map(map_to_pred, batched=True, batch_size=1)
 print("CER:", cer(result["source_processed"], result["transcription"]))
-print("WER:", cer(result["source_processed"], result["transcription"]))
 ```
 ## Citation

 - qu
 metrics:
 - cer
 pipeline_tag: automatic-speech-recognition
 datasets:
 - ivangtorre/second_americas_nlp_2022
     metrics:
     - name: Test CER
       type: cer
+      value: 16.02
 ---
+This model was finetuned from a Wav2vec2.0 XLS-R model: 300M with the Quechua train parition of the Americas NLP 2022 dataset. This challenge took place during NeurIPSS 2022.
+## Example of usage
 The model can be used directly (without a language model) as follows:
 processor = Wav2Vec2Processor.from_pretrained("ivangtorre/wav2vec2-xlsr-300m-quechua")
 model = Wav2Vec2ForCTC.from_pretrained("ivangtorre/wav2vec2-xlsr-300m-quechua")
+# Pat to wav file
+pathfile = "/path/to/wavfile"
+# Load and normalize the file
+wav, curr_sample_rate = sf.read(pathfile, dtype="float32")
+feats = torch.from_numpy(wav).float()
+with torch.no_grad():
+    feats = F.layer_norm(feats, feats.shape)
+feats = torch.unsqueeze(feats, 0)
+logits = model(feats).logits
 # take argmax and decode
 predicted_ids = torch.argmax(logits, dim=-1)
 result = librispeech_eval.map(map_to_pred, batched=True, batch_size=1)
 print("CER:", cer(result["source_processed"], result["transcription"]))
 ```
 ## Citation