Update README.md
Browse files
README.md
CHANGED
@@ -66,7 +66,7 @@ Using Parler-TTS is as simple as "bonjour". Simply install the library once:
|
|
66 |
pip install git+https://github.com/huggingface/parler-tts.git
|
67 |
```
|
68 |
|
69 |
-
###
|
70 |
|
71 |
|
72 |
**Parler-TTS** has been trained to generate speech with features that can be controlled with a simple text prompt, for example:
|
@@ -94,35 +94,6 @@ audio_arr = generation.cpu().numpy().squeeze()
|
|
94 |
sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)
|
95 |
```
|
96 |
|
97 |
-
### 🎯 Using a specific speaker
|
98 |
-
|
99 |
-
To ensure speaker consistency across generations, this checkpoint was also trained on 34 speakers, characterized by name (e.g. Jon, Lea, Gary, Jenna, Mike, Laura).
|
100 |
-
|
101 |
-
To take advantage of this, simply adapt your text description to specify which speaker to use: `Jon's voice is monotone yet slightly fast in delivery, with a very close recording that almost has no background noise.`
|
102 |
-
|
103 |
-
```py
|
104 |
-
import torch
|
105 |
-
from parler_tts import ParlerTTSForConditionalGeneration
|
106 |
-
from transformers import AutoTokenizer
|
107 |
-
import soundfile as sf
|
108 |
-
|
109 |
-
device = "cuda:0" if torch.cuda.is_available() else "cpu"
|
110 |
-
|
111 |
-
model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler-tts-mini-multilingual").to(device)
|
112 |
-
tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler-tts-mini-multilingual")
|
113 |
-
description_tokenizer = AutoTokenizer.from_pretrained(model.config.text_encoder._name_or_path)
|
114 |
-
|
115 |
-
prompt = "Hey, how are you doing today?"
|
116 |
-
description = "Jon's voice is monotone yet slightly fast in delivery, with a very close recording that almost has no background noise."
|
117 |
-
|
118 |
-
input_ids = description_tokenizer(description, return_tensors="pt").input_ids.to(device)
|
119 |
-
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
|
120 |
-
|
121 |
-
generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
|
122 |
-
audio_arr = generation.cpu().numpy().squeeze()
|
123 |
-
sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)
|
124 |
-
```
|
125 |
-
|
126 |
**Tips**:
|
127 |
* We've set up an [inference guide](https://github.com/huggingface/parler-tts/blob/main/INFERENCE.md) to make generation faster. Think SDPA, torch.compile, batching and streaming!
|
128 |
* Include the term "very clear audio" to generate the highest quality audio, and "very noisy audio" for high levels of background noise
|
|
|
66 |
pip install git+https://github.com/huggingface/parler-tts.git
|
67 |
```
|
68 |
|
69 |
+
### Inference
|
70 |
|
71 |
|
72 |
**Parler-TTS** has been trained to generate speech with features that can be controlled with a simple text prompt, for example:
|
|
|
94 |
sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)
|
95 |
```
|
96 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
97 |
**Tips**:
|
98 |
* We've set up an [inference guide](https://github.com/huggingface/parler-tts/blob/main/INFERENCE.md) to make generation faster. Think SDPA, torch.compile, batching and streaming!
|
99 |
* Include the term "very clear audio" to generate the highest quality audio, and "very noisy audio" for high levels of background noise
|