README.md · MALIBA-AI/songhoy-asr at main

songhoy-asr / README.md

sudoping01

Update README.md

05ce87a verified 23 days ago

preview code

raw

history blame contribute delete

5.12 kB

	---
	library_name: peft
	license: apache-2.0
	base_model: openai/whisper-large-v2
	tags:
	- automatic-speech-recognition
	- whisper
	- asr
	- songhoy
	- hsn
	- Mali
	- MALIBA-AI
	- lora
	- fine-tuned
	- code-switching
	- african-language
	language:
	- hsn
	- fr
	language_bcp47:
	- hsn-ML
	- fr-ML
	model-index:
	- name: songhoy-asr-v1
	results:
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: songhoy-asr
	type: custom
	split: test
	args:
	language: hsn
	metrics:
	- name: WER
	type: wer
	value: 16.58
	- name: CER
	type: cer
	value: 4.63
	pipeline_tag: automatic-speech-recognition
	---

	# Songhoy-ASR-v1: First Open-Source Speech Recognition Model for Songhoy

	Songhoy-ASR-v1 represents a historic milestone as the first open-source speech recognition model for Songhoy, a language spoken by over 3 million people across Mali, Niger, and Burkina Faso. Developed as part of the MALIBA-AI initiative, this groundbreaking model not only achieves impressive accuracy but opens the door to speech technology for Songhoy speakers for the very first time.

	## Model Overview

	This model demonstrates exceptional performance for Songhoy speech recognition, with particularly strong capabilities in:

	- Pure Songhoy recognition: Accurate transcription of traditional and contemporary Songhoy speech
	- Code-switching handling: Effectively manages the natural mixing of Songhoy with French
	- Dialect adaptation: Works across regional variations of Songhoy
	- Noise resilience: Maintains accuracy even with moderate background noise

	## Impressive Performance Metrics

	Songhoy-ASR-v1 achieves breakthrough results on our test dataset:

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Word Error Rate (WER) \| 16.58% \|
	\| Character Error Rate (CER) \| 4.63% \|

	These results represent the best publicly available performance for Songhoy speech recognition, making this model suitable for production applications.

	## Technical Details

	The model is a fine-tuned version of OpenAI's Whisper-large-v2, adapted specifically for Songhoy using LoRA (Low-Rank Adaptation). This efficient fine-tuning approach allowed us to achieve excellent results while maintaining the multilingual capabilities of the base model.

	### Training Information
	- Base Model: openai/whisper-large-v2
	- Fine-tuning Method: LoRA (Parameter-Efficient Fine-Tuning)
	- Training Dataset: [coming soon]
	- Training Duration: 4 epochs
	- Batch Size: 32 (8 per device with gradient accumulation steps of 4)
	- Learning Rate: 0.001 with linear scheduler and 50 warmup steps
	- Mixed Precision: Native AMP

	### Training Results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 0.3661 \| 1.0 \| 245 \| 0.3118 \|
	\| 0.2712 \| 2.0 \| 490 \| 0.2215 \|
	\| 0.2008 \| 3.0 \| 735 \| 0.2011 \|
	\| 0.1518 \| 3.9857 \| 976 \| 0.1897 \|

	## Real-World Applications

	Songhoy-ASR-v1 enables numerous applications previously unavailable to Songhoy speakers:

	- Media Transcription: Automatic subtitling of Songhoy content
	- Voice Interfaces: Voice-controlled applications in Songhoy
	- Educational Tools: Language learning and literacy applications
	- Cultural Preservation: Documentation of oral histories and traditions
	- Healthcare Communication: Improved access to health information
	- Accessibility Solutions: Tools for the hearing impaired

	## Usage Examples

	```
	Coming soon
	```

	## Limitations

	[Coming Soon]
	<!--
	- Performance varies with different regional dialects of Songhoy
	- Very specific technical terminology may have lower accuracy
	- Extreme background noise can impact transcription quality
	- Very young speakers or non-native speakers may have reduced accuracy
	- Limited performance with extremely low-quality audio recordings -->

	## Part of MALIBA-AI's African Language Initiative

	Songhoy-ASR-v1 is part of MALIBA-AI's commitment to developing speech technology for all Malian languages. This model represents a significant step toward digital inclusion for Songhoy speakers and demonstrates the potential for high-quality AI systems for African languages.

	Our mission of "No Malian Language Left Behind" drives us to develop technologies that:
	- Preserve linguistic diversity
	- Enable access to digital tools regardless of language
	- Support local innovation and content creation
	- Bridge the digital divide for all Malians

	## Framework Versions
	- PEFT 0.14.1.dev0
	- Transformers 4.50.0.dev0
	- PyTorch 2.5.1+cu124
	- Datasets 3.2.0
	- Tokenizers 0.21.0

	## License

	This model is released under the Apache 2.0 license.

	## Citation

	```bibtex
	@misc{songhoy-asr-v1,
	author = {MALIBA-AI},
	title = {Songhoy-ASR-v1: Speech Recognition for Songhoy},
	year = {2025},
	publisher = {HuggingFace},
	howpublished = {\url{https://huggingface.co/MALIBA-AI/songhoy-asr-v1}}
	}
	```

	---

	MALIBA-AI: Empowering Mali's Future Through Community-Driven AI Innovation

	"No Malian Language Left Behind"