Wolof ASR Model (Based on Whisper-Small) trained with mixed human and machine generated dataset

Model Overview

This repository hosts an Automatic Speech Recognition (ASR) model for the Wolof language, fine-tuned from OpenAI's Whisper-small model. This model aims to provide accurate transcription of Wolof audio data.

Model Details

  • Model Base: Whisper-small
  • Loss: 0.123
  • WER: 0.16

Dataset

The dataset used for training and evaluating this model is a collection from various sources, ensuring a rich and diverse set of Wolof audio samples. The collection is available in my Hugging Face account is used by keeping only the audios with duration shorter than 6 second. In addition of this dataset, audios from YouTub videos are used to synthetize labeled data. This machine generated dataset is mixed with the training dataset and represents 19 % of the dataset used during the training.

  • Training Dataset: 57 hours and 13 hours audio with machine generated transcripts
  • Test Dataset: 10 hours

For detailed information about the dataset, please refer to the M9and2M/Wolof_ASR_dataset.

Training

The training process was adapted from the code in the Finetune Wa2vec 2.0 For Speech Recognition written to fine-tune Wav2Vec2.0 for speech recognition. Special thanks to the author, Duy Khanh, Le for providing a robust and flexible training framework.

The model was trained with the following configuration:

  • Seed: 19
  • Training Batch Size: 1
  • Gradient Accumulation Steps: 8
  • Number of GPUs: 2

Optimizer : AdamW

  • Learning Rate: 1e-7

Scheduler: OneCycleLR

  • Max Learning Rate: 5e-5

Acknowledgements

This model was built using OpenAI's Whisper-small architecture and fine-tuned with a dataset collected from various sources. Special thanks to the creators and contributors of the dataset.

More Information

This model has been developed in the context of my Master Thesis at ETSIT-UPM, Madrid under the supervision of Prof. Luis A. Hernández Gómez.

Contact

For any inquiries or questions, please contact [email protected]

Downloads last month
42
Safetensors
Model size
242M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train M9and2M/whisper_small_wolof_mix_hum_mach_data