Text-to-Speech
Fairseq
English
audio
File size: 1,286 Bytes
40abaa8
 
 
 
 
 
 
 
 
 
 
92edc45
40abaa8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---
library_name: fairseq
task: text-to-speech
tags:
- fairseq
- audio
- text-to-speech
language: en
datasets:
- ljspeech
---
## Example to download TTS Transformer from fairseq

The following should work with fairseq's most up-to-date version in a google colab:

```python
from fairseq.checkpoint_utils import load_model_ensemble_and_task_from_hf_hub
import IPython.display as ipd
import torch

model_ensemble, cfg, task = load_model_ensemble_and_task_from_hf_hub(
    "facebook/tts_transformer-en-ljspeech", arg_overrides={"vocoder": "griffin_lim", "fp16": False}
)

def tokenize(text):
  import g2p_en
  tokenized = g2p_en.G2p()(text)
  tokenized = [{",": "sp", ";": "sp"}.get(p, p) for p in tokenized]
  return " ".join(p for p in tokenized if p.isalnum())
  
text = "Hello, this is a test run."

tokenized = tokenize(text)
sample = {
    "net_input": {
        "src_tokens": task.src_dict.encode_line(tokenized).view(1, -1),
        "src_lengths": torch.Tensor([len(tokenized.split())]).long(),
        "prev_output_tokens": None
        },
    "target_lengths": None,
    "speaker": None,
}
generator = task.build_generator(model_ensemble, cfg)
generation = generator.generate(model_ensemble[0], sample)
waveform = generation[0]["waveform"]

ipd.Audio(waveform, rate=task.sr)
```