UmarRamzan
/

w2v2-bert-ngram-urdu

Automatic Speech Recognition

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

UmarRamzan commited on May 15

Commit

47acf89

•

1 Parent(s): 0cf7378

Update README.md

Files changed (1) hide show

README.md +20 -19

README.md CHANGED Viewed

@@ -10,29 +10,41 @@ model-index:
   results: []
 language:
 - ur
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# w2v2-bert-urdu
-This model is a fine-tuned version of [UmarRamzan/w2v2-bert-urdu](https://huggingface.co/UmarRamzan/w2v2-bert-urdu) on an unknown dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.3681
-- Wer: 0.2573
 ## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure
@@ -51,17 +63,6 @@ The following hyperparameters were used during training:
 - num_epochs: 1
 - mixed_precision_training: Native AMP
-### Training results
-| Training Loss | Epoch  | Step | Validation Loss | Wer    |
-|:-------------:|:------:|:----:|:---------------:|:------:|
-| 0.4362        | 0.1695 | 50   | 0.4144          | 0.3213 |
-| 0.3776        | 0.3390 | 100  | 0.4029          | 0.3137 |
-| 0.3918        | 0.5085 | 150  | 0.4095          | 0.3060 |
-| 0.3968        | 0.6780 | 200  | 0.3961          | 0.3060 |
-| 0.3685        | 0.8475 | 250  | 0.3681          | 0.2929 |
 ### Framework versions
 - Transformers 4.40.2

   results: []
 language:
 - ur
+datasets:
+- mozilla-foundation/common_voice_17_0
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# Wav2Vec-Bert-2.0-Urdu
+This model is a fine-tuned version of [facebook/w2v-bert-2.0](https://huggingface.co/facebook/w2v-bert-2.0) on the Urdu split of the [Common Voice 17](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0) dataset. The fine-tuned model is enhanced with the addition of an ngram language model that has also been trained on the same dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.3681
+- Wer: 0.2407
 ## Model description
+## Usage Instructions
+```python
+from transformers import AutoFeatureExtractor, Wav2Vec2BertModel
+import torch
+from datasets import load_dataset
+dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation")
+dataset = dataset.sort("id")
+sampling_rate = dataset.features["audio"].sampling_rate
+processor = AutoProcessor.from_pretrained("UmarRamzan/w2v2-bert-ngram-urdu")
+model = Wav2Vec2BertModel.from_pretrained("UmarRamzan/w2v2-bert-ngram-urdu")
+# audio file is decoded on the fly
+inputs = processor(dataset[0]["audio"]["array"], sampling_rate=sampling_rate, return_tensors="pt")
+with torch.no_grad():
+    outputs = model(**inputs)
+```
 ## Training procedure
 - num_epochs: 1
 - mixed_precision_training: Native AMP
 ### Framework versions
 - Transformers 4.40.2