Spark TTS Vietnamese
Spark-TTS is an advanced text-to-speech system that uses the power of large language models (LLM) for highly accurate and natural-sounding voice synthesis. It is designed to be efficient, flexible, and powerful for both research and production use. This model is trained from viVoice vietnamese dataset
Usage
First, install the required packages:
pip install --upgrade transformers accelerate
Text-to-Speech
We have customized the code so you can inference using the huggingface transformer library without installing anything else.
from transformers import AutoProcessor, AutoModel, AutoTokenizer
import soundfile as sf
import torch
import numpy as np
device = "cuda"
model_id = "DragonLineageAI/Vi-SparkTTS-0.5B"
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModel.from_pretrained(model_id, trust_remote_code=True).eval()
processor.model = model
prompt_audio_path = "path_to_audio_path" # CHANGE TO YOUR ACTUAL PATH
prompt_transcript = "text corresponding to prompt audio" # Optional
text_input = "xin chào mọi người chúng tôi là Nguyễn Công Tú Anh và Chu Văn An đến từ dragonlineageai"
inputs = processor(
text=text_input.lower(),
prompt_speech_path=prompt_audio_path,
prompt_text=prompt_transcript,
return_tensors="pt"
).to(device)
global_tokens_prompt = inputs.pop("global_token_ids_prompt", None)
with torch.no_grad():
output_ids = model.generate(
**inputs,
max_new_tokens=3000,
do_sample=True,
temperature=0.8,
top_k=50,
top_p=0.95,
eos_token_id=processor.tokenizer.eos_token_id,
pad_token_id=processor.tokenizer.pad_token_id
)
output_clone = processor.decode(
generated_ids=output_ids,
global_token_ids_prompt=global_tokens_prompt,
input_ids_len=inputs["input_ids"].shape[-1]
)
sf.write("output_cloned.wav", output_clone["audio"], output_clone["sampling_rate"])
Fintune
You can finetune this model with any dataset to improve quality or train on a new language. training code
- Downloads last month
- 2,065
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no library tag.