parler-tts
/

parler-tts-mini-multilingual

text2text-generation

Model card Files Files and versions Community

ylacombe HF staff commited on 16 days ago

Commit

7c26b34

•

1 Parent(s): d369927

Update README.md

Files changed (1) hide show

README.md +1 -30

README.md CHANGED Viewed

@@ -66,7 +66,7 @@ Using Parler-TTS is as simple as "bonjour". Simply install the library once:
 pip install git+https://github.com/huggingface/parler-tts.git
 ```
-### 🎲 Random voice
 **Parler-TTS** has been trained to generate speech with features that can be controlled with a simple text prompt, for example:
@@ -94,35 +94,6 @@ audio_arr = generation.cpu().numpy().squeeze()
 sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)
 ```
-### 🎯 Using a specific speaker
-To ensure speaker consistency across generations, this checkpoint was also trained on 34 speakers, characterized by name (e.g. Jon, Lea, Gary, Jenna, Mike, Laura).
-To take advantage of this, simply adapt your text description to specify which speaker to use: `Jon's voice is monotone yet slightly fast in delivery, with a very close recording that almost has no background noise.`
-```py
-import torch
-from parler_tts import ParlerTTSForConditionalGeneration
-from transformers import AutoTokenizer
-import soundfile as sf
-device = "cuda:0" if torch.cuda.is_available() else "cpu"
-model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler-tts-mini-multilingual").to(device)
-tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler-tts-mini-multilingual")
-description_tokenizer = AutoTokenizer.from_pretrained(model.config.text_encoder._name_or_path)
-prompt = "Hey, how are you doing today?"
-description = "Jon's voice is monotone yet slightly fast in delivery, with a very close recording that almost has no background noise."
-input_ids = description_tokenizer(description, return_tensors="pt").input_ids.to(device)
-prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
-generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
-audio_arr = generation.cpu().numpy().squeeze()
-sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)
-```
 **Tips**:
 * We've set up an [inference guide](https://github.com/huggingface/parler-tts/blob/main/INFERENCE.md) to make generation faster. Think SDPA, torch.compile, batching and streaming!
 * Include the term "very clear audio" to generate the highest quality audio, and "very noisy audio" for high levels of background noise

 pip install git+https://github.com/huggingface/parler-tts.git
 ```
+### Inference
 **Parler-TTS** has been trained to generate speech with features that can be controlled with a simple text prompt, for example:
 sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)
 ```
 **Tips**:
 * We've set up an [inference guide](https://github.com/huggingface/parler-tts/blob/main/INFERENCE.md) to make generation faster. Think SDPA, torch.compile, batching and streaming!
 * Include the term "very clear audio" to generate the highest quality audio, and "very noisy audio" for high levels of background noise