ssid32's picture
Update README.md
7721787 verified
|
raw
history blame
3.57 kB
---
license: cc-by-nc-4.0
language: ddn
metrics:
- wer
tags:
- text-to-audio
- automatic-speech-recognition
- wav2vec2-fine-tuning
- dendi-text-to-speech
model-index:
- name: Dendi Numerals ASR
results:
- task:
name: Speech Recognition
type: automatic-speech-recognition
dataset:
name: dendi
type: dendi_numbers_dataset
metrics:
- name: Test WER
type: wer
value: 18.18
pipeline_tag: automatic-speech-recognition
---
# CreaTiv Team (CTT): Dendi Numerals Automatic Speech Recognition
This repository contains an Automatic Speech Recognition (ASR) model specifically for recognizing numerals in the Dendi (ddn) language.
The model can accurately recognize numbers ranging from 0 to 1,000,000,000 when spoken in Dendi.
This model is part of Creativ Team's [Noulinmon](https://noulinmon.baruwuu.bj/) project, a user-friendly mobile app designed to make calculations accessible in six local languages of Benin, featuring voice reading and AI capabilities.
You can find more CTT-ASR models on the Hugging Face Hub: [ssid32/ctt-asr](https://huggingface.co/models?sort=trending&search=ssid32).
CTT-ASR is available in the 🤗 Transformers library from version 4.4 onwards.
## Model Details
The model is a fine-tuned version of [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Dendi.
When using this model, make sure that your speech input is sampled at 16kHz.
## Usage
To use this model, first install the latest version of 🤗 Transformers library:
```
pip install --upgrade transformers accelerate
```
Then, run inference with the following code-snippet:
```python
import torch
import torchaudio
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
processor = Wav2Vec2Processor.from_pretrained("ssid32/wav2vec2-xlsr-dendi-ddn-for-numerals")
model = Wav2Vec2ForCTC.from_pretrained("ssid32/wav2vec2-xlsr-dendi-ddn-for-numerals")
speech_array, sampling_rate = torchaudio.load("audio_test.wav")
speech_array = speech_array.squeeze().numpy()
inputs = processor(speech_array, sampling_rate=16_000, return_tensors="pt", padding=True)
with torch.no_grad():
logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits
output = processor.batch_decode(torch.argmax(logits, dim=-1))
print("Output:", output)
```
You can listen to the sample audio here:
<audio controls>
<source src="https://huggingface.co/ssid32/wav2vec2-xlsr-dendi-ddn-for-numerals/resolve/main/audio_test.wav" type="audio/wav">
Your browser does not support the audio element.
</audio>
Upon processing the sample audio, the model produces the following output:
```
Output: ['zangu ihaaku nda weiguu']
```
In this case, the output represents the numeral **850** in the Dendi language.
### Evaluation result
The model's performance on a test set yields a Word Error Rate (WER) of **18.18**%.
## Authors
This model was developed by:
- Salim KORA GUERA (HuggingFace Username: [ssid32](https://huggingface.co/ssid32)) | ([email protected])
- Etienne TOVIMAFA (HuggingFace Username: [MrBendji](https://huggingface.co/MrBendji)) | ([email protected])
## Citation
```bibtex
@misc {
author = { {Salim KORA GUERA and Etienne TOVIMAFA} },
title = { wav2vec2-xlsr-dendi-ddn-for-numerals },
year = 2024,
url = { https://huggingface.co/ssid32/wav2vec2-xlsr-dendi-ddn-for-numerals },
doi = { 10.57967/hf/2930 },
publisher = { Hugging Face }
}
```
## License
The model is licensed as **CC-BY-NC 4.0**.