|
--- |
|
license: cc-by-nc-4.0 |
|
pipeline_tag: text-to-speech |
|
library_name: f5-tts |
|
datasets: |
|
- amphion/Emilia-Dataset |
|
language: |
|
- de |
|
tags: |
|
- tts |
|
- audio |
|
- german |
|
- mlx |
|
--- |
|
|
|
Copied from https://huggingface.co/marduk-ra/F5-TTS-German, added trained duration model on emilia dataset using https://github.com/eamag/f5-tts-duration |
|
|
|
Inference with https://github.com/lucasnewman/f5-tts-mlx |
|
```bash |
|
python -m f5_tts_mlx.generate --model "eamag/f5-tts-mlx-german" \ |
|
--text "The quick brown fox jumped over the lazy dog." \ |
|
--ref-audio /path/to/audio.wav \ |
|
--ref-text "This is the caption for the reference audio." |
|
``` |
|
Github: https://github.com/SWivid/F5-TTS |
|
Paper: [F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching](https://huggingface.co/papers/2410.06885) |
|
|
|
> **_NOTE:_** You can set the number of nfe steps to 64 to produce better quality sound. |
|
|