|
--- |
|
license: cc-by-nc-4.0 |
|
language: ddn |
|
metrics: |
|
- wer |
|
tags: |
|
- text-to-audio |
|
- automatic-speech-recognition |
|
- wav2vec2-fine-tuning |
|
- dendi-text-to-speech |
|
model-index: |
|
- name: Dendi Numerals ASR |
|
results: |
|
- task: |
|
name: Speech Recognition |
|
type: automatic-speech-recognition |
|
dataset: |
|
name: dendi |
|
type: dendi_numbers_dataset |
|
metrics: |
|
- name: Test WER |
|
type: wer |
|
value: 18.18 |
|
pipeline_tag: automatic-speech-recognition |
|
--- |
|
|
|
# CreaTiv Team (CTT): Dendi Numerals Automatic Speech Recognition |
|
|
|
This repository contains an Automatic Speech Recognition (ASR) model specifically for recognizing numerals in the Dendi (ddn) language. |
|
The model can accurately recognize numbers ranging from 0 to 1,000,000,000 when spoken in Dendi. |
|
|
|
This model is part of Creativ Team's [Noulinmon](https://noulinmon.baruwuu.bj/) project, a user-friendly mobile app designed to make calculations accessible in six local languages of Benin, featuring voice reading and AI capabilities. |
|
You can find more CTT-ASR models on the Hugging Face Hub: [ssid32/ctt-asr](https://huggingface.co/models?sort=trending&search=ssid32). |
|
|
|
CTT-ASR is available in the 🤗 Transformers library from version 4.4 onwards. |
|
|
|
## Model Details |
|
|
|
The model is a fine-tuned version of [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Dendi. |
|
When using this model, make sure that your speech input is sampled at 16kHz. |
|
|
|
|
|
## Usage |
|
|
|
To use this model, first install the latest version of 🤗 Transformers library: |
|
|
|
``` |
|
pip install --upgrade transformers accelerate |
|
``` |
|
|
|
Then, run inference with the following code-snippet: |
|
|
|
```python |
|
import torch |
|
import torchaudio |
|
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor |
|
|
|
processor = Wav2Vec2Processor.from_pretrained("ssid32/wav2vec2-xlsr-dendi-ddn-for-numerals") |
|
model = Wav2Vec2ForCTC.from_pretrained("ssid32/wav2vec2-xlsr-dendi-ddn-for-numerals") |
|
|
|
speech_array, sampling_rate = torchaudio.load("audio_test.wav") |
|
speech_array = speech_array.squeeze().numpy() |
|
inputs = processor(speech_array, sampling_rate=16_000, return_tensors="pt", padding=True) |
|
|
|
with torch.no_grad(): |
|
logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits |
|
output = processor.batch_decode(torch.argmax(logits, dim=-1)) |
|
|
|
print("Output:", output) |
|
|
|
``` |
|
|
|
|
|
|
|
You can listen to the sample audio here: |
|
|
|
<audio controls> |
|
<source src="https://huggingface.co/ssid32/wav2vec2-xlsr-dendi-ddn-for-numerals/resolve/main/audio_test.wav" type="audio/wav"> |
|
Your browser does not support the audio element. |
|
</audio> |
|
|
|
Upon processing the sample audio, the model produces the following output: |
|
|
|
``` |
|
Output: ['zangu ihaaku nda weiguu'] |
|
``` |
|
|
|
In this case, the output represents the numeral **850** in the Dendi language. |
|
|
|
### Evaluation result |
|
|
|
The model's performance on a test set yields a Word Error Rate (WER) of **18.18**%. |
|
|
|
## Authors |
|
|
|
This model was developed by: |
|
- Salim KORA GUERA (HuggingFace Username: [ssid32](https://huggingface.co/ssid32)) | ([email protected]) |
|
- Etienne TOVIMAFA (HuggingFace Username: [MrBendji](https://huggingface.co/MrBendji)) | ([email protected]) |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@misc { |
|
author = { {Salim KORA GUERA and Etienne TOVIMAFA} }, |
|
title = { wav2vec2-xlsr-dendi-ddn-for-numerals }, |
|
year = 2024, |
|
url = { https://huggingface.co/ssid32/wav2vec2-xlsr-dendi-ddn-for-numerals }, |
|
doi = { 10.57967/hf/2930 }, |
|
publisher = { Hugging Face } |
|
} |
|
``` |
|
|
|
## License |
|
|
|
The model is licensed as **CC-BY-NC 4.0**. |