mrfakename commited on
Commit
532542f
·
verified ·
1 Parent(s): a8c3d97

Sync from GitHub repo

Browse files

This Space is synced from the GitHub repo: https://github.com/SWivid/F5-TTS. Please submit contributions to the Space there

src/f5_tts/infer/README.md CHANGED
@@ -13,7 +13,7 @@ To avoid possible inference failures, make sure you have seen through the follow
13
  - Add some spaces (blank: " ") or punctuations (e.g. "," ".") <ins>to explicitly introduce some pauses</ins>.
14
  - If English punctuation marks the end of a sentence, make sure there is a space " " after it. Otherwise not regarded as when chunk.
15
  - <ins>Preprocess numbers</ins> to Chinese letters if you want to have them read in Chinese, otherwise in English.
16
- - If the generation output is blank (pure silence), <ins>check for ffmpeg installation</ins>.
17
  - Try <ins>turn off `use_ema` if using an early-stage</ins> finetuned checkpoint (which goes just few updates).
18
 
19
 
@@ -129,6 +129,28 @@ ref_text = ""
129
  ```
130
  You should mark the voice with `[main]` `[town]` `[country]` whenever you want to change voice, refer to `src/f5_tts/infer/examples/multi/story.txt`.
131
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
132
  ## Socket Real-time Service
133
 
134
  Real-time voice output with chunk stream:
 
13
  - Add some spaces (blank: " ") or punctuations (e.g. "," ".") <ins>to explicitly introduce some pauses</ins>.
14
  - If English punctuation marks the end of a sentence, make sure there is a space " " after it. Otherwise not regarded as when chunk.
15
  - <ins>Preprocess numbers</ins> to Chinese letters if you want to have them read in Chinese, otherwise in English.
16
+ - If the generation output is blank (pure silence), <ins>check for FFmpeg installation</ins>.
17
  - Try <ins>turn off `use_ema` if using an early-stage</ins> finetuned checkpoint (which goes just few updates).
18
 
19
 
 
129
  ```
130
  You should mark the voice with `[main]` `[town]` `[country]` whenever you want to change voice, refer to `src/f5_tts/infer/examples/multi/story.txt`.
131
 
132
+ ## API Usage
133
+
134
+ ```python
135
+ from importlib.resources import files
136
+ from f5_tts.api import F5TTS
137
+
138
+ f5tts = F5TTS()
139
+ wav, sr, spec = f5tts.infer(
140
+ ref_file=str(files("f5_tts").joinpath("infer/examples/basic/basic_ref_en.wav")),
141
+ ref_text="some call me nature, others call me mother nature.",
142
+ gen_text="""I don't really care what you call me. I've been a silent spectator, watching species evolve, empires rise and fall. But always remember, I am mighty and enduring. Respect me and I'll nurture you; ignore me and you shall face the consequences.""",
143
+ file_wave=str(files("f5_tts").joinpath("../../tests/api_out.wav")),
144
+ file_spec=str(files("f5_tts").joinpath("../../tests/api_out.png")),
145
+ seed=None,
146
+ )
147
+ ```
148
+ Check [api.py](../api.py) for more details.
149
+
150
+ ## TensorRT-LLM Deployment
151
+
152
+ See [detailed instructions](../runtime/triton_trtllm/README.md) for more information.
153
+
154
  ## Socket Real-time Service
155
 
156
  Real-time voice output with chunk stream:
src/f5_tts/infer/infer_cli.py CHANGED
@@ -323,7 +323,7 @@ def main():
323
  ref_text_ = voices[voice]["ref_text"]
324
  gen_text_ = text.strip()
325
  print(f"Voice: {voice}")
326
- audio_segment, final_sample_rate, spectragram = infer_process(
327
  ref_audio_,
328
  ref_text_,
329
  gen_text_,
 
323
  ref_text_ = voices[voice]["ref_text"]
324
  gen_text_ = text.strip()
325
  print(f"Voice: {voice}")
326
+ audio_segment, final_sample_rate, spectrogram = infer_process(
327
  ref_audio_,
328
  ref_text_,
329
  gen_text_,
src/f5_tts/infer/utils_infer.py CHANGED
@@ -384,7 +384,7 @@ def infer_process(
384
  ):
385
  # Split the input text into batches
386
  audio, sr = torchaudio.load(ref_audio)
387
- max_chars = int(len(ref_text.encode("utf-8")) / (audio.shape[-1] / sr) * (22 - audio.shape[-1] / sr))
388
  gen_text_batches = chunk_text(gen_text, max_chars=max_chars)
389
  for i, gen_text in enumerate(gen_text_batches):
390
  print(f"gen_text {i}", gen_text)
 
384
  ):
385
  # Split the input text into batches
386
  audio, sr = torchaudio.load(ref_audio)
387
+ max_chars = int(len(ref_text.encode("utf-8")) / (audio.shape[-1] / sr) * (22 - audio.shape[-1] / sr) * speed)
388
  gen_text_batches = chunk_text(gen_text, max_chars=max_chars)
389
  for i, gen_text in enumerate(gen_text_batches):
390
  print(f"gen_text {i}", gen_text)
src/f5_tts/train/README.md CHANGED
@@ -1,5 +1,11 @@
1
  # Training
2
 
 
 
 
 
 
 
3
  ## Prepare Dataset
4
 
5
  Example data processing scripts, and you may tailor your own one along with a Dataset class in `src/f5_tts/model/dataset.py`.
 
1
  # Training
2
 
3
+ Check your FFmpeg installation:
4
+ ```bash
5
+ ffmpeg -version
6
+ ```
7
+ If not found, install it first (or skip assuming you know of other backends available).
8
+
9
  ## Prepare Dataset
10
 
11
  Example data processing scripts, and you may tailor your own one along with a Dataset class in `src/f5_tts/model/dataset.py`.