license: mit
language: ar
datasets:
- mozilla-foundation/common_voice_17_0
metrics:
- wer
library_name: nemo
pipeline_tag: automatic-speech-recognition
tags:
- asr
- automatic speech recognition
Model Card for Arabic ASR with NeMo Conformer CTC
Model Details
Model Name: NeMo-Conformer-CTC-Arabic-ASR
Model Type: Conformer CTC (Connectionist Temporal Classification) (small)
Language: Arabic
License: MIT
Model Creator: Mostafa Ahmed
Contact Information: [email protected]
Model Version: 1.0
Overview
NeMo-Conformer-CTC-Arabic-ASR is a fine-tuned version of the NeMo Conformer CTC model specifically designed for Automatic Speech Recognition (ASR) task in Arabic. The model has been trained to convert spoken Arabic into written text, making it suitable for various applications such as transcription services, voice assistants, and accessibility tools.
Intended Use
The model is intended for use in:
- Automatic Speech Recognition (ASR) systems for Arabic
- Transcription services for Arabic audio
- Voice assistants and conversational agents
- Accessibility tools for Arabic speakers
Training Data
The model was fine-tuned on the Arabic Common Voice dataset, an open-source dataset of transcribed speech. The dataset includes a variety of speakers and audio conditions, ensuring the model's robustness in different scenarios.
Data Sources:
- Common Voice: A multilingual dataset for speech recognition tasks.
Training Procedure
The model was trained using NVIDIA's NeMo framework. The training process involved:
- Preprocessing the Common Voice dataset and convert it to manifests to format the audio and transcriptions for ASR.
- Fine-tuning the pre-trained Conformer CTC model on the Arabic common voice dataset.
- Evaluating the model's performance using standard ASR metrics (Word Error Rate, WER).
Evaluation Results
The model was evaluated on a held-out test set from the Arabic portion of the Common Voice dataset. Here are the key performance metrics:
- Word Error Rate (WER): 30% on Train, 32% on Validation and 40% on Test (No Language Model)
This metric indicates the model's effectiveness in accurately transcribing Arabic speech into text.
How to Use
You can load and use the model with the NeMo framework as follows:
import nemo.collections.asr as nemo_asr
# Load the model
asr_model = nemo_asr.models.EncDecCTCModel.from_pretrained("MostafaAhmed98/Conformer-CTC-Arabic-ASR")
# Example usage
audio_file = "path/to/arabic_audio.wav"
transcription = asr_model.transcribe([audio_file])
print(transcription[0]) # Output: Transcribed Arabic text