File size: 2,479 Bytes

8bc572b
 
 
 
 
 
 
 
 
f4ac233
 
 
 
 
8bc572b
f4ac233
8bc572b
 
 
 
 
a8959cb
8bc572b
a8959cb
 
 
 
8bc572b
 
 
5465aab
a8959cb
 
 
 
 
 
 
 
 
 
 
 
 
 
8bc572b
 
f4ac233
8bc572b
f4ac233
 
8bc572b
f4ac233
8bc572b
f4ac233
8bc572b
 
 
f4ac233
8bc572b
 
 
f4ac233
8bc572b
f4ac233
8bc572b
f4ac233
 
8bc572b
 
f4ac233
 
 
 
8bc572b
f4ac233
8bc572b
f4ac233
8bc572b
 
 
f4ac233
8bc572b
f4ac233
8bc572b
 
 
 
 
 
f4ac233

---
language: en
datasets:
- librispeech_asr
tags:
- audio
- automatic-speech-recognition
- hf-asr-leaderboard
license: apache-2.0
widget:
- example_title: Librispeech sample 1
  src: https://cdn-media.huggingface.co/speech_samples/sample1.flac
- example_title: Librispeech sample 2
  src: https://cdn-media.huggingface.co/speech_samples/sample2.flac
model-index:
- name: patrickvonplaten/wav2vec2-large-960h-lv60-self-4-gram
  results:
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: LibriSpeech (clean)
      type: librispeech_asr
      config: clean
      split: test
      args: 
        language: en
    metrics:
    - name: Test WER
      type: wer
      value: 1.84
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: LibriSpeech (other)
      type: librispeech_asr
      config: other
      split: test
      args: 
        language: en
    metrics:
    - name: Test WER
      type: wer
      value: 3.71
---

# Wav2Vec2-Base-960h + 4-gram

This model is identical to [Facebook's Wav2Vec2-Large-960h-lv60-self](https://huggingface.co/facebook/wav2vec2-large-960h-lv60-self), but is 
augmented with an English 4-gram. The `4-gram.arpa.gz` of [Librispeech's official ngrams](https://www.openslr.org/11) is used.
 
 ## Evaluation
 
 This code snippet shows how to evaluate **patrickvonplaten/wav2vec2-large-960h-lv60-self-4-gram** on LibriSpeech's "clean" and "other" test data.
 
```python
from datasets import load_dataset
from transformers import AutoModelForCTC, AutoProcessor
import torch
from jiwer import wer

model_id = "patrickvonplaten/wav2vec2-large-960h-lv60-self-4-gram"

librispeech_eval = load_dataset("librispeech_asr", "other", split="test")

model = AutoModelForCTC.from_pretrained(model_id).to("cuda")
processor = AutoProcessor.from_pretrained(model_id)

def map_to_pred(batch):
    inputs = processor(batch["audio"]["array"], sampling_rate=16_000, return_tensors="pt")

    inputs = {k: v.to("cuda") for k,v in inputs.items()}

    with torch.no_grad():
        logits = model(**inputs).logits

    transcription = processor.batch_decode(logits.cpu().numpy()).text[0]
    batch["transcription"] = transcription
    return batch

result = librispeech_eval.map(map_to_pred, remove_columns=["audio"])

print(wer(result["text"], result["transcription"]))
```

*Result (WER)*:

| "clean" | "other" |
|---|---|
| 1.84 | 3.71 |