patrickvonplaten
/

hubert-xlarge-ls960-ft-4-gram

Automatic Speech Recognition

hf-asr-leaderboard

Inference Endpoints

Model card Files Files and versions Community

hubert-xlarge-ls960-ft-4-gram / README.md

anton-l's picture

anton-l HF staff

Update README.md

bf7facb over 2 years ago

|

history blame contribute delete

2.45 kB

	---
	language: en
	datasets:
	- librispeech_asr
	tags:
	- audio
	- automatic-speech-recognition
	- hf-asr-leaderboard
	license: apache-2.0
	widget:
	- example_title: Librispeech sample 1
	src: https://cdn-media.huggingface.co/speech_samples/sample1.flac
	- example_title: Librispeech sample 2
	src: https://cdn-media.huggingface.co/speech_samples/sample2.flac
	model-index:
	- name: patrickvonplaten/hubert-xlarge-ls960-ft-4-gram
	results:
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: LibriSpeech (clean)
	type: librispeech_asr
	config: clean
	split: test
	args:
	language: en
	metrics:
	- name: Test WER
	type: wer
	value: 1.71
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: LibriSpeech (other)
	type: librispeech_asr
	config: other
	split: test
	args:
	language: en
	metrics:
	- name: Test WER
	type: wer
	value: 3.06
	---

	# Hubert-XLarge-ls960-ft + 4-gram

	This model is identical to [Facebook's hubert-xlarge-ls960-ft](https://huggingface.co/facebook/hubert-xlarge-ls960-ft), but is
	augmented with an English 4-gram. The `4-gram.arpa.gz` of [Librispeech's official ngrams](https://www.openslr.org/11) is used.

	## Evaluation

	This code snippet shows how to evaluate patrickvonplaten/hubert-xlarge-ls960-ft-4-gram on LibriSpeech's "clean" and "other" test data.

	```python
	from datasets import load_dataset
	from transformers import AutoModelForCTC, AutoProcessor
	import torch
	from jiwer import wer

	model_id = "patrickvonplaten/hubert-xlarge-ls960-ft-4-gram"

	librispeech_eval = load_dataset("librispeech_asr", "other", split="test")

	model = AutoModelForCTC.from_pretrained(model_id).to("cuda")
	processor = AutoProcessor.from_pretrained(model_id)

	def map_to_pred(batch):
	inputs = processor(batch["audio"]["array"], sampling_rate=16_000, return_tensors="pt")

	inputs = {k: v.to("cuda") for k,v in inputs.items()}

	with torch.no_grad():
	logits = model(**inputs).logits

	transcription = processor.batch_decode(logits.cpu().numpy()).text[0]
	batch["transcription"] = transcription
	return batch

	result = librispeech_eval.map(map_to_pred, remove_columns=["audio"])

	print(wer(result["text"], result["transcription"]))
	```

	Result (WER):

	\| "clean" \| "other" \|
	\|---\|---\|
	\| 1.71 \| 3.06 \|