--- license: mit datasets: - mussacharles60/mcv-sw-female-dataset - mozilla-foundation/common_voice_17_0 language: - sw base_model: - facebook/mms-tts tags: - text-to-speech - swahili - swahili-text-to-speech - tts - swahili-tts library_name: transformers --- Swahili female voice text-to-speech model This is a continuous development of text-to-speech model for female voice using Swahili language Please give it a try for inference try the following ```python # import all required libraries from transformers import VitsModel, AutoTokenizer import torch import numpy as np import scipy.io.wavfile # Load model and tokenizer model = VitsModel.from_pretrained("mussacharles60/swahili-tts-female-voice") tokenizer = AutoTokenizer.from_pretrained("mussacharles60/swahili-tts-female-voice") # Running the TTS text = "Mambo vipi ?, Hii ni Myssa Tech sauti ya A.I, kujaribishwa na Mussa Charles" inputs = tokenizer(text, return_tensors="pt") # Generate waveform with torch.no_grad(): output = model(**inputs).waveform # Convert PyTorch tensor to NumPy array output_np = output.squeeze().cpu().numpy() # Write to WAV file scipy.io.wavfile.write("female_voice_test.wav", rate=model.config.sampling_rate, data=output_np) ``` You're all welcome to contribute. Thanks 🤗