|
--- |
|
library_name: speechbrain |
|
pipeline_tag: text-to-speech |
|
language: "en" |
|
tags: |
|
- text-to-speech |
|
- TTS |
|
- speech-synthesis |
|
- speechbrain |
|
license: "apache-2.0" |
|
datasets: |
|
- LJSpeech |
|
--- |
|
|
|
# Text-to-Speech (TTS) with Transformer trained on LJSpeech |
|
|
|
This repository provides all the necessary tools for Text-to-Speech (TTS) with SpeechBrain using a [Transformer](https://arxiv.org/pdf/1809.08895.pdf) pretrained on [LJSpeech](https://keithito.com/LJ-Speech-Dataset/). |
|
|
|
The pre-trained model takes in text input and produces a spectrogram in output. One can get the final waveform by applying a vocoder (e.g., HiFIGAN) on top of the generated spectrogram. |
|
|
|
### Perform Text-to-Speech (TTS) |
|
|
|
```python |
|
import torchaudio |
|
from speechbrain.inference.vocoders import HIFIGAN |
|
|
|
texts = ["This is the example text"] |
|
|
|
#initializing my model |
|
my_tts_model = TextToSpeech.from_hparams(source="/content/") |
|
|
|
#initializing vocoder(Hifigan) model |
|
hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-ljspeech", savedir="tmpdir_vocoder") |
|
|
|
# Running the TTS |
|
mel_output = my_tts_model.encode_text(texts) |
|
|
|
# Running Vocoder (spectrogram-to-waveform) |
|
waveforms = hifi_gan.decode_batch(mel_output) |
|
|
|
# Save the waverform |
|
torchaudio.save('example_TTS.wav',waveforms.squeeze(1), 22050) |
|
|
|
``` |
|
|
|
|