|
--- |
|
library_name: peft |
|
license: apache-2.0 |
|
base_model: openai/whisper-large-v2 |
|
tags: |
|
- automatic-speech-recognition |
|
- whisper |
|
- asr |
|
- songhoy |
|
- hsn |
|
- Mali |
|
- MALIBA-AI |
|
- lora |
|
- fine-tuned |
|
- code-switching |
|
- african-language |
|
language: |
|
- hsn |
|
- fr |
|
language_bcp47: |
|
- hsn-ML |
|
- fr-ML |
|
model-index: |
|
- name: songhoy-asr-v1 |
|
results: |
|
- task: |
|
name: Automatic Speech Recognition |
|
type: automatic-speech-recognition |
|
dataset: |
|
name: songhoy-asr |
|
type: custom |
|
split: test |
|
args: |
|
language: hsn |
|
metrics: |
|
- name: WER |
|
type: wer |
|
value: 16.58 |
|
- name: CER |
|
type: cer |
|
value: 4.63 |
|
pipeline_tag: automatic-speech-recognition |
|
--- |
|
|
|
# Songhoy-ASR-v1: First Open-Source Speech Recognition Model for Songhoy |
|
|
|
Songhoy-ASR-v1 represents a historic milestone as the **first open-source speech recognition model** for Songhoy, a language spoken by over 3 million people across Mali, Niger, and Burkina Faso. Developed as part of the MALIBA-AI initiative, this groundbreaking model not only achieves impressive accuracy but opens the door to speech technology for Songhoy speakers for the very first time. |
|
|
|
## Model Overview |
|
|
|
This model demonstrates exceptional performance for Songhoy speech recognition, with particularly strong capabilities in: |
|
|
|
- **Pure Songhoy recognition**: Accurate transcription of traditional and contemporary Songhoy speech |
|
- **Code-switching handling**: Effectively manages the natural mixing of Songhoy with French |
|
- **Dialect adaptation**: Works across regional variations of Songhoy |
|
- **Noise resilience**: Maintains accuracy even with moderate background noise |
|
|
|
## Impressive Performance Metrics |
|
|
|
Songhoy-ASR-v1 achieves breakthrough results on our test dataset: |
|
|
|
| Metric | Value | |
|
|--------|-------| |
|
| Word Error Rate (WER) | 16.58% | |
|
| Character Error Rate (CER) | 4.63% | |
|
|
|
These results represent the best publicly available performance for Songhoy speech recognition, making this model suitable for production applications. |
|
|
|
## Technical Details |
|
|
|
The model is a fine-tuned version of OpenAI's Whisper-large-v2, adapted specifically for Songhoy using LoRA (Low-Rank Adaptation). This efficient fine-tuning approach allowed us to achieve excellent results while maintaining the multilingual capabilities of the base model. |
|
|
|
### Training Information |
|
- **Base Model**: openai/whisper-large-v2 |
|
- **Fine-tuning Method**: LoRA (Parameter-Efficient Fine-Tuning) |
|
- **Training Dataset**: [coming soon] |
|
- **Training Duration**: 4 epochs |
|
- **Batch Size**: 32 (8 per device with gradient accumulation steps of 4) |
|
- **Learning Rate**: 0.001 with linear scheduler and 50 warmup steps |
|
- **Mixed Precision**: Native AMP |
|
|
|
### Training Results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | |
|
|:-------------:|:------:|:----:|:---------------:| |
|
| 0.3661 | 1.0 | 245 | 0.3118 | |
|
| 0.2712 | 2.0 | 490 | 0.2215 | |
|
| 0.2008 | 3.0 | 735 | 0.2011 | |
|
| 0.1518 | 3.9857 | 976 | 0.1897 | |
|
|
|
## Real-World Applications |
|
|
|
Songhoy-ASR-v1 enables numerous applications previously unavailable to Songhoy speakers: |
|
|
|
- **Media Transcription**: Automatic subtitling of Songhoy content |
|
- **Voice Interfaces**: Voice-controlled applications in Songhoy |
|
- **Educational Tools**: Language learning and literacy applications |
|
- **Cultural Preservation**: Documentation of oral histories and traditions |
|
- **Healthcare Communication**: Improved access to health information |
|
- **Accessibility Solutions**: Tools for the hearing impaired |
|
|
|
## Usage Examples |
|
|
|
``` |
|
Coming soon |
|
``` |
|
|
|
## Limitations |
|
|
|
[Coming Soon] |
|
<!-- |
|
- Performance varies with different regional dialects of Songhoy |
|
- Very specific technical terminology may have lower accuracy |
|
- Extreme background noise can impact transcription quality |
|
- Very young speakers or non-native speakers may have reduced accuracy |
|
- Limited performance with extremely low-quality audio recordings --> |
|
|
|
## Part of MALIBA-AI's African Language Initiative |
|
|
|
Songhoy-ASR-v1 is part of MALIBA-AI's commitment to developing speech technology for all Malian languages. This model represents a significant step toward digital inclusion for Songhoy speakers and demonstrates the potential for high-quality AI systems for African languages. |
|
|
|
Our mission of "No Malian Language Left Behind" drives us to develop technologies that: |
|
- Preserve linguistic diversity |
|
- Enable access to digital tools regardless of language |
|
- Support local innovation and content creation |
|
- Bridge the digital divide for all Malians |
|
|
|
## Framework Versions |
|
- PEFT 0.14.1.dev0 |
|
- Transformers 4.50.0.dev0 |
|
- PyTorch 2.5.1+cu124 |
|
- Datasets 3.2.0 |
|
- Tokenizers 0.21.0 |
|
|
|
## License |
|
|
|
This model is released under the Apache 2.0 license. |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@misc{songhoy-asr-v1, |
|
author = {MALIBA-AI}, |
|
title = {Songhoy-ASR-v1: Speech Recognition for Songhoy}, |
|
year = {2025}, |
|
publisher = {HuggingFace}, |
|
howpublished = {\url{https://huggingface.co/MALIBA-AI/songhoy-asr-v1}} |
|
} |
|
``` |
|
|
|
--- |
|
|
|
**MALIBA-AI: Empowering Mali's Future Through Community-Driven AI Innovation** |
|
|
|
*"No Malian Language Left Behind"* |