Update README.md
Browse files
README.md
CHANGED
@@ -1,31 +1,36 @@
|
|
1 |
---
|
2 |
license: cc-by-nc-sa-4.0
|
3 |
datasets:
|
4 |
-
-
|
5 |
-
- bond005/sberdevices_golos_10h_crowd
|
6 |
-
- bond005/sova_rudevices
|
7 |
-
- Aniemore/resd_annotated
|
8 |
language:
|
9 |
-
-
|
10 |
base_model:
|
11 |
- SWivid/F5-TTS
|
12 |
---
|
13 |
## Overview
|
14 |
-
The F5-TTS model is fine-tuned on the LJSpeech dataset with an emphasis on stability, ensuring it avoids choppiness, mispronunciations, repetitions, and skipping words
|
15 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
|
17 |
## License
|
18 |
This model is released under the Creative Commons Attribution Non Commercial Share Alike 4.0 license, which allows for free usage, modification, and distribution
|
19 |
|
20 |
## Model Information
|
21 |
**Base Model**: SWivid/F5-TTS
|
22 |
-
**Total Training Duration:**
|
23 |
|
24 |
**Training Configuration:**
|
25 |
```json
|
26 |
"exp_name": "F5TTS_Base",
|
27 |
"learning_rate": 1e-05,
|
28 |
-
"batch_size_per_gpu":
|
29 |
"batch_size_type": "frame",
|
30 |
"max_samples": 64,
|
31 |
"grad_accumulation_steps": 1,
|
|
|
1 |
---
|
2 |
license: cc-by-nc-sa-4.0
|
3 |
datasets:
|
4 |
+
- LJSpeech
|
|
|
|
|
|
|
5 |
language:
|
6 |
+
- en
|
7 |
base_model:
|
8 |
- SWivid/F5-TTS
|
9 |
---
|
10 |
## Overview
|
11 |
+
The F5-TTS model is fine-tuned on the LJSpeech dataset with an emphasis on stability, ensuring it avoids choppiness, mispronunciations, repetitions, and skipping words.
|
12 |
+
|
13 |
+
Differences from the original model: The text input is converted to phonenes, we don't use the raw text. The phoneme alignment is used during training, whereas a duration predictor is used during inference.
|
14 |
+
|
15 |
+
Source code for phoneme alignment: https://github.com/sinhprous/F5-TTS/blob/main/src/f5_tts/train/datasets/utils_alignment.py
|
16 |
+
|
17 |
+
Source code for duration predictor: https://github.com/sinhprous/F5-TTS/blob/main/src/f5_tts/model/duration_predictor.py
|
18 |
+
|
19 |
+
## Audio samples
|
20 |
+
|
21 |
|
22 |
## License
|
23 |
This model is released under the Creative Commons Attribution Non Commercial Share Alike 4.0 license, which allows for free usage, modification, and distribution
|
24 |
|
25 |
## Model Information
|
26 |
**Base Model**: SWivid/F5-TTS
|
27 |
+
**Total Training Duration:** 130.000 steps
|
28 |
|
29 |
**Training Configuration:**
|
30 |
```json
|
31 |
"exp_name": "F5TTS_Base",
|
32 |
"learning_rate": 1e-05,
|
33 |
+
"batch_size_per_gpu": 2000,
|
34 |
"batch_size_type": "frame",
|
35 |
"max_samples": 64,
|
36 |
"grad_accumulation_steps": 1,
|