sinhprous commited on
Commit
1092a85
·
verified ·
1 Parent(s): c4f8895

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -9
README.md CHANGED
@@ -1,31 +1,36 @@
1
  ---
2
  license: cc-by-nc-sa-4.0
3
  datasets:
4
- - mozilla-foundation/common_voice_17_0
5
- - bond005/sberdevices_golos_10h_crowd
6
- - bond005/sova_rudevices
7
- - Aniemore/resd_annotated
8
  language:
9
- - ru
10
  base_model:
11
  - SWivid/F5-TTS
12
  ---
13
  ## Overview
14
- The F5-TTS model is fine-tuned on the LJSpeech dataset with an emphasis on stability, ensuring it avoids choppiness, mispronunciations, repetitions, and skipping words
15
- Differences from the original model: the phoneme alignment was used during training, whereas a duration predictor is used during inference.
 
 
 
 
 
 
 
 
16
 
17
  ## License
18
  This model is released under the Creative Commons Attribution Non Commercial Share Alike 4.0 license, which allows for free usage, modification, and distribution
19
 
20
  ## Model Information
21
  **Base Model**: SWivid/F5-TTS
22
- **Total Training Duration:** 250.000 steps
23
 
24
  **Training Configuration:**
25
  ```json
26
  "exp_name": "F5TTS_Base",
27
  "learning_rate": 1e-05,
28
- "batch_size_per_gpu": 4500,
29
  "batch_size_type": "frame",
30
  "max_samples": 64,
31
  "grad_accumulation_steps": 1,
 
1
  ---
2
  license: cc-by-nc-sa-4.0
3
  datasets:
4
+ - LJSpeech
 
 
 
5
  language:
6
+ - en
7
  base_model:
8
  - SWivid/F5-TTS
9
  ---
10
  ## Overview
11
+ The F5-TTS model is fine-tuned on the LJSpeech dataset with an emphasis on stability, ensuring it avoids choppiness, mispronunciations, repetitions, and skipping words.
12
+
13
+ Differences from the original model: The text input is converted to phonenes, we don't use the raw text. The phoneme alignment is used during training, whereas a duration predictor is used during inference.
14
+
15
+ Source code for phoneme alignment: https://github.com/sinhprous/F5-TTS/blob/main/src/f5_tts/train/datasets/utils_alignment.py
16
+
17
+ Source code for duration predictor: https://github.com/sinhprous/F5-TTS/blob/main/src/f5_tts/model/duration_predictor.py
18
+
19
+ ## Audio samples
20
+
21
 
22
  ## License
23
  This model is released under the Creative Commons Attribution Non Commercial Share Alike 4.0 license, which allows for free usage, modification, and distribution
24
 
25
  ## Model Information
26
  **Base Model**: SWivid/F5-TTS
27
+ **Total Training Duration:** 130.000 steps
28
 
29
  **Training Configuration:**
30
  ```json
31
  "exp_name": "F5TTS_Base",
32
  "learning_rate": 1e-05,
33
+ "batch_size_per_gpu": 2000,
34
  "batch_size_type": "frame",
35
  "max_samples": 64,
36
  "grad_accumulation_steps": 1,