Spark TTS Vietnamese

Spark-TTS is an advanced text-to-speech system that uses the power of large language models (LLM) for highly accurate and natural-sounding voice synthesis. It is designed to be efficient, flexible, and powerful for both research and production use. This model is trained from viVoice vietnamese dataset

Usage

First, install the required packages:

pip install --upgrade transformers accelerate

Text-to-Speech

We have customized the code so you can inference using the huggingface transformer library without installing anything else.

from transformers import AutoProcessor, AutoModel, AutoTokenizer
import soundfile as sf
import torch
import numpy as np

device = "cuda"
model_id = "DragonLineageAI/Vi-SparkTTS-0.5B"
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModel.from_pretrained(model_id, trust_remote_code=True).eval()
processor.model = model
 
prompt_audio_path = "path_to_audio_path" # CHANGE TO YOUR ACTUAL PATH
prompt_transcript = "text corresponding to prompt audio" # Optional
text_input = "xin chào mọi người chúng tôi là Nguyễn Công Tú Anh và Chu Văn An đến từ dragonlineageai"
 
inputs = processor(
    text=text_input.lower(),
    prompt_speech_path=prompt_audio_path,
    prompt_text=prompt_transcript,
    return_tensors="pt"
).to(device)
global_tokens_prompt = inputs.pop("global_token_ids_prompt", None)
 
with torch.no_grad():
    output_ids = model.generate(
        **inputs,
        max_new_tokens=3000,
        do_sample=True,
        temperature=0.8,
        top_k=50,
        top_p=0.95,
        eos_token_id=processor.tokenizer.eos_token_id,  
        pad_token_id=processor.tokenizer.pad_token_id  
    )
       
output_clone = processor.decode(
    generated_ids=output_ids,
    global_token_ids_prompt=global_tokens_prompt,
    input_ids_len=inputs["input_ids"].shape[-1]
)
 
sf.write("output_cloned.wav", output_clone["audio"], output_clone["sampling_rate"])

Fintune

You can finetune this model with any dataset to improve quality or train on a new language. training code

DragonLineageAI
/

Vi-SparkTTS-0.5B

Spark TTS Vietnamese

Usage

Text-to-Speech

Fintune

Space using DragonLineageAI/Vi-SparkTTS-0.5B 1