metadata

license: mit
language: ar
datasets:
  - mozilla-foundation/common_voice_17_0
metrics:
  - wer
library_name: nemo
pipeline_tag: automatic-speech-recognition
tags:
  - asr
  - automatic speech recognition

Model Card for Arabic ASR with NeMo Conformer CTC

Model Details

Model Name: NeMo-Conformer-CTC-Arabic-ASR

Model Type: Conformer CTC (Connectionist Temporal Classification) (small)

Language: Arabic

License: MIT

Model Creator: Mostafa Ahmed

Contact Information: [email protected]

Model Version: 1.0

Overview

NeMo-Conformer-CTC-Arabic-ASR is a fine-tuned version of the NeMo Conformer CTC model specifically designed for Automatic Speech Recognition (ASR) task in Arabic. The model has been trained to convert spoken Arabic into written text, making it suitable for various applications such as transcription services, voice assistants, and accessibility tools.

Intended Use

The model is intended for use in:

Automatic Speech Recognition (ASR) systems for Arabic
Transcription services for Arabic audio
Voice assistants and conversational agents
Accessibility tools for Arabic speakers

Training Data

The model was fine-tuned on the Arabic Common Voice dataset, an open-source dataset of transcribed speech. The dataset includes a variety of speakers and audio conditions, ensuring the model's robustness in different scenarios.

Data Sources:

Common Voice: A multilingual dataset for speech recognition tasks.

Training Procedure

The model was trained using NVIDIA's NeMo framework. The training process involved:

Preprocessing the Common Voice dataset and convert it to manifests to format the audio and transcriptions for ASR.
Fine-tuning the pre-trained Conformer CTC model on the Arabic common voice dataset.
Evaluating the model's performance using standard ASR metrics (Word Error Rate, WER).

Evaluation Results

The model was evaluated on a held-out test set from the Arabic portion of the Common Voice dataset. Here are the key performance metrics:

Word Error Rate (WER): 30% on Train, 32% on Validation and 40% on Test (No Language Model)

This metric indicates the model's effectiveness in accurately transcribing Arabic speech into text.

How to Use

You can load and use the model with the NeMo framework as follows:

import nemo.collections.asr as nemo_asr

# Load the model
asr_model = nemo_asr.models.EncDecCTCModel.from_pretrained("MostafaAhmed98/Conformer-CTC-Arabic-ASR")

# Example usage
audio_file = "path/to/arabic_audio.wav"
transcription = asr_model.transcribe([audio_file])

print(transcription[0])  # Output: Transcribed Arabic text