MarmaSpeakTTS: Text-to-Speech Model for Marma Language

This model provides text-to-speech synthesis for the Marma language (ISO code: rmz), a Tibeto-Burman language spoken by the Marma people in Bangladesh and Myanmar.

Model Details

Base model: Massively Multilingual Speech (MMS)
Type: Text-to-Speech
Language: Marma (rmz)
Training Data: The model was trained on Marma language audio recordings collected by CLEAR Global.
Training script: https://github.com/translatorswb/finetune-hf-vits-marma
License: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International

Usage

This model can be used with the 🤗 Transformers library:

from transformers import VitsModel, AutoTokenizer, pipeline
import scipy.io.wavfile

# Load model and tokenizer
model = VitsModel.from_pretrained("CLEAR-Global/marmaspeak-tts-v1")
tokenizer = AutoTokenizer.from_pretrained("CLEAR-Global/marmaspeak-tts-v1")

# Create a pipeline
synthesizer = pipeline("text-to-speech", model=model, tokenizer=tokenizer)

# Synthesize text
text = "ကိုတော် ဇာမာ နီရေလည်း၊"  # Marma text example
output = synthesizer(text)

# Save to file
scipy.io.wavfile.write("output.wav", rate=16000, data=output["audio"][0])

Limitations and Biases

This is an early version of the model and may have limitations in pronunciation and naturalness.
The model works best with properly normalized Marma text.
Performance may vary based on the complexity and length of the input text.

Training

The model was fine-tuned from a Massively Multilingual Speech (MMS) VITS model using this training recipe.

Ethical Considerations

This model has been developed with permission and input from Marma language speakers. The voice synthesis should be used responsibly and respectfully.

Citation

@misc{marma-tts,
  author = {CLEAR Global},
  title = {MarmaSpeakTTS: A Text-to-Speech Model for Marma Language},
  year = {2025},
  howpublished = {https://huggingface.co/CLEAR-Global/marmaspeak-tts-v1}
}

CLEAR-Global
/

marmaspeak-tts-v1