polodealvarado
/

xls-r-300m-es

Automatic Speech Recognition

common_voice_8_0

Generated from Trainer

hf-asr-leaderboard

mozilla-foundation/common_voice_8_0

robust-speech-event

Inference Endpoints

Model card Files Files and versions Community

polodealvarado commited on Feb 1, 2022

Commit

e68ed8f

·

1 Parent(s): 0a6eab5

Update README.md

Files changed (1) hide show

README.md +41 -1

README.md CHANGED Viewed

@@ -36,7 +36,47 @@ It achieves the following results on the evaluation set:
 - Loss : 0.1900
 - Wer : 0.146
-## Usage
 ## Model description

 - Loss : 0.1900
 - Wer : 0.146
+## Usage with 5-gram.
+The model can be used with n-gram included in the processor as follows.
+```python
+import re
+from transformers import AutoModelForCTC,Wav2Vec2ProcessorWithLM
+import torch
+processor = Wav2Vec2ProcessorWithLM.from_pretrained("polodealvarado/xls-r-300m-es")
+model = AutoModelForCTC.from_pretrained("polodealvarado/xls-r-300m-es")
+# Cleaning characters
+def remove_extra_chars(batch):
+    chars_to_ignore_regex = '[^a-záéíóúñ ]'
+    text = batch["translation"][target_lang]
+    batch["text"] = re.sub(chars_to_ignore_regex, "", text.lower())
+    return batch
+# Preparing dataset
+def prepare_dataset(batch):
+    audio = batch["audio"]
+    batch["input_values"] = processor(audio["array"], sampling_rate=audio["sampling_rate"]).input_values[0]
+    with processor.as_target_processor():
+        batch["labels"] = processor(batch["sentence"]).input_ids
+    return batch
+common_voice_test = load_dataset("mozilla-foundation/common_voice_8_0", "es", split="test",use_auth_token=True)
+common_voice_test.remove_columns(["accent", "age", "client_id", "down_votes", "gender", "locale", "segment", "up_votes"])
+common_voice_test = common_voice_test.cast_column("audio", Audio(sampling_rate=16_000))
+common_voice_test = common_voice_test.map(remove_extra_chars, remove_columns=dataset.column_names)
+common_voice_test = common_voice_test.map(prepare_dataset)
+with torch.no_grad():
+    logits = model(**inputs).logits
+pred_ids = torch.argmax(logits, dim=-1)
+text = processor.batch_decode(logits.numpy()).text
+```
 ## Model description