metadata
language:
- ar
license: apache-2.0
tags:
- automatic-speech-recognition
- robust-speech-event
datasets:
- mozilla-foundation/common_voice_8_0
metrics:
- wer
- cer
model-index:
- name: Sinai Voice Arabic Speech Recognition Model
results:
- task:
type: automatic-speech-recognition
name: Speech Recognition
dataset:
type: mozilla-foundation/common_voice_8_0
name: Common Voice ar
args: ar
metrics:
- type: wer
value: 0.18
name: Test WER
- type: cer
value: 0.051
name: Test CER
WER: 0.18855042016806722
CER: 0.05138746531806014
Sinai Voice Arabic Speech Recognition Model
نموذج صوت سيناء للتعرف على الأصوات العربية الفصحى و تحويلها إلى نصوص
This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the common_voice 8 dataset.
It achieves the following results on the evaluation set:
- Loss: 0.22
- Wer: 0.189
- Cer: 0.051
Evaluation Commands
- To evaluate on
mozilla-foundation/common_voice_8_0
with splittest
python eval.py --model_id bakrianoo/sinai-voice-ar-stt --dataset mozilla-foundation/common_voice_8_0 --config ar --split test
Inference Without LM
from transformers import (Wav2Vec2Processor, Wav2Vec2ForCTC)
import torchaudio
import torch
def speech_file_to_array_fn(voice_path, resampling_to=16000):
speech_array, sampling_rate = torchaudio.load(voice_path)
resampler = torchaudio.transforms.Resample(sampling_rate, resampling_to)
return resampler(speech_array)[0].numpy(), sampling_rate
# load the model
cp = "bakrianoo/sinai-voice-ar-stt"
processor = Wav2Vec2Processor.from_pretrained(cp)
model = Wav2Vec2ForCTC.from_pretrained(cp)
# recognize the text in a sample sound file
sound_path = './my_voice.mp3'
sample, sr = speech_file_to_array_fn(sound_path)
inputs = processor([sample], sampling_rate=16_000, return_tensors="pt", padding=True)
with torch.no_grad():
logits = model(inputs.input_values,).logits
predicted_ids = torch.argmax(logits, dim=-1)
print("Prediction:", processor.batch_decode(predicted_ids))
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 32
- eval_batch_size: 10
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 1000
- num_epochs: 8.32
- mixed_precision_training: Native AMP