README.md · bene-ges/tts_ru_hifigan

metadata

license: cc-by-nc-4.0
language:
  - ru
library_name: nemo
tags:
  - tts
  - text-to-speech
  - Vocoder

See example of inference pipeline for Russian TTS (G2P + FastPitch + HifiGAN) in this notebook. Or use this bash-script.

This model accepts batches of mel spectrograms.

This model outputs audio at 22050Hz.

Training

The NeMo toolkit [1] was used for training the model for several epochs. Full training script is here.

This model is trained on RUSLAN [2] corpus (single speaker, male voice) sampled at 22050Hz.

[1] NVIDIA NeMo Toolkit
[2] Gabdrakhmanov L., Garaev R., Razinkov E. (2019) RUSLAN: Russian Spoken Language Corpus for Speech Synthesis. In: Salah A., Karpov A., Potapova R. (eds) Speech and Computer. SPECOM 2019. Lecture Notes in Computer Science, vol 11658. Springer, Cham