ⓍTTS 🇦🇷
ⓍTTS is a Voice generation model that lets you clone voices into different languages by using just a quick 6-second audio clip. There is no need for an excessive amount of training data that spans countless hours.
This model was trained by IdeaLab in CITECCA, in the Universidad Nacional de Rio Negro
Language
This model's Spanish language has been finetuned using ylacombe's google argentinian spanish dataset to archieve an argentinian accent.
Training Parameters
batch_size=8,
grad_accum_steps=96,
batch_group_size=48,
eval_batch_size=8,
num_loader_workers=8,
eval_split_max_size=256,
optimizer="AdamW",
optimizer_wd_only_on_weights=True,
optimizer_params={"betas": [0.9, 0.96], "eps": 1e-8, "weight_decay": 1e-2},
lr=5e-06,
lr_scheduler="MultiStepLR",
lr_scheduler_params={"milestones": [50000 * 18, 150000 * 18, 300000 * 18], "gamma": 0.5, "last_epoch": -1},
License
This model is licensed under Coqui Public Model License. There's a lot that goes into a license for generative models, and you can read more of the origin story of CPML here.
Using 🐸TTS Command line:
tts --model_name /path/to/xtts/ \
--text "Che boludo, vamos a tomar unos mates." \
--speaker_wav /path/to/target/speaker.wav \
--language_idx es \
--use_cuda true
Using the model directly:
from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts
config = XttsConfig()
config.load_json("/path/to/xtts/config.json")
model = Xtts.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir="/path/to/xtts/", eval=True)
model.cuda()
outputs = model.synthesize(
"Che boludo, vamos a tomar unos mates.",
config,
speaker_wav="/data/TTS-public/_refclips/3.wav",
gpt_cond_len=3,
language="es",
)
- Downloads last month
- 28
Inference API (serverless) does not yet support coqui models for this pipeline type.