|
--- |
|
license: apache-2.0 |
|
language: |
|
- de |
|
library_name: nemo |
|
tags: |
|
- tts |
|
- pytorch |
|
- FastPitch |
|
- speech |
|
pipeline_tag: text-to-speech |
|
--- |
|
|
|
This FastPitch[1] model was trained on the HUI-Audio-Corpus-German[2] clean dataset using the Nemo Toolkit[3]. |
|
We selected 5 speakers who have the 5-largest amount of data and balanced training data across speakers (around 20 hours per speaker). |
|
|
|
|
|
|
|
This a retrained model of: |
|
https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/tts_de_fastpitch_multispeaker_5 |
|
|
|
|
|
# How to Use: |
|
Use with Nemo Toolkit version 1.14.0 |
|
```python |
|
# Load spectrogram generator |
|
from nemo.collections.tts.models import FastPitchModel |
|
spec_generator = FastPitchModel.restore_from("path/to/model.nemo") |
|
|
|
# Load Vocoder |
|
from nemo.collections.tts.models import HifiGanModel |
|
model = HifiGanModel.from_pretrained(model_name="tts_de_hui_hifigan_ft_fastpitch_multispeaker_5") |
|
|
|
# Generate audio |
|
import torchaudio |
|
parsed = spec_generator.parse("") |
|
speaker_id = 0 |
|
spectrogram = spec_generator.generate_spectrogram(tokens=parsed, speaker=speaker_id) |
|
audio = model.convert_spectrogram_to_audio(spec=spectrogram) |
|
|
|
# Save the audio to disk in a file called speech.wav |
|
torchaudio.save('german_speech.wav', audio.cpu(), 44100) |
|
``` |
|
|
|
|
|
|
|
[1] FastPitch: Parallel Text-to-speech with Pitch Prediction: https://arxiv.org/abs/2006.06873 |
|
[2] HUI-Audio-Corpus-German Dataset: https://opendata.iisys.de/datasets.html |
|
[3] NVIDIA NeMo Toolkit: https://github.com/NVIDIA/NeMo |