|
--- |
|
license: mit |
|
datasets: |
|
- mussacharles60/mcv-sw-female-dataset |
|
- mozilla-foundation/common_voice_17_0 |
|
language: |
|
- sw |
|
base_model: |
|
- facebook/mms-tts |
|
tags: |
|
- text-to-speech |
|
- swahili |
|
- swahili-text-to-speech |
|
- tts |
|
- swahili-tts |
|
library_name: transformers |
|
--- |
|
|
|
|
|
Swahili female voice text-to-speech model |
|
|
|
This is a continuous development of text-to-speech model for female voice using Swahili language |
|
|
|
Please give it a try |
|
|
|
for inference try the following |
|
|
|
```python |
|
# import all required libraries |
|
from transformers import VitsModel, AutoTokenizer |
|
import torch |
|
import numpy as np |
|
import scipy.io.wavfile |
|
|
|
# Load model and tokenizer |
|
model = VitsModel.from_pretrained("mussacharles60/swahili-tts-female-voice") |
|
tokenizer = AutoTokenizer.from_pretrained("mussacharles60/swahili-tts-female-voice") |
|
|
|
# Running the TTS |
|
text = "Mambo vipi ?, Hii ni Myssa Tech sauti ya A.I, kujaribishwa na Mussa Charles" |
|
inputs = tokenizer(text, return_tensors="pt") |
|
|
|
# Generate waveform |
|
with torch.no_grad(): |
|
output = model(**inputs).waveform |
|
|
|
# Convert PyTorch tensor to NumPy array |
|
output_np = output.squeeze().cpu().numpy() |
|
|
|
# Write to WAV file |
|
scipy.io.wavfile.write("female_voice_test.wav", rate=model.config.sampling_rate, data=output_np) |
|
|
|
``` |
|
|
|
You're all welcome to contribute. |
|
|
|
Thanks 🤗 |