sinhprous
/

F5TTS-stabilized-LJSpeech

Model card Files Files and versions Community

sinhprous commited on 9 days ago

Commit

1092a85

·

verified ·

1 Parent(s): c4f8895

Update README.md

Files changed (1) hide show

README.md +14 -9

README.md CHANGED Viewed

@@ -1,31 +1,36 @@
 ---
 license: cc-by-nc-sa-4.0
 datasets:
-- mozilla-foundation/common_voice_17_0
-- bond005/sberdevices_golos_10h_crowd
-- bond005/sova_rudevices
-- Aniemore/resd_annotated
 language:
-- ru
 base_model:
 - SWivid/F5-TTS
 ---
 ## Overview
-The F5-TTS model is fine-tuned on the LJSpeech dataset with an emphasis on stability, ensuring it avoids choppiness, mispronunciations, repetitions, and skipping words
-Differences from the original model: the phoneme alignment was used during training, whereas a duration predictor is used during inference.
 ## License
 This model is released under the Creative Commons Attribution Non Commercial Share Alike 4.0 license, which allows for free usage, modification, and distribution
 ## Model Information
 **Base Model**: SWivid/F5-TTS
-**Total Training Duration:** 250.000 steps
 **Training Configuration:**
 ```json
 "exp_name": "F5TTS_Base",
 "learning_rate": 1e-05,
-"batch_size_per_gpu": 4500,
 "batch_size_type": "frame",
 "max_samples": 64,
 "grad_accumulation_steps": 1,

 ---
 license: cc-by-nc-sa-4.0
 datasets:
+- LJSpeech
 language:
+- en
 base_model:
 - SWivid/F5-TTS
 ---
 ## Overview
+The F5-TTS model is fine-tuned on the LJSpeech dataset with an emphasis on stability, ensuring it avoids choppiness, mispronunciations, repetitions, and skipping words.
+Differences from the original model: The text input is converted to phonenes, we don't use the raw text. The phoneme alignment is used during training, whereas a duration predictor is used during inference.
+Source code for phoneme alignment: https://github.com/sinhprous/F5-TTS/blob/main/src/f5_tts/train/datasets/utils_alignment.py
+Source code for duration predictor: https://github.com/sinhprous/F5-TTS/blob/main/src/f5_tts/model/duration_predictor.py
+## Audio samples
 ## License
 This model is released under the Creative Commons Attribution Non Commercial Share Alike 4.0 license, which allows for free usage, modification, and distribution
 ## Model Information
 **Base Model**: SWivid/F5-TTS
+**Total Training Duration:** 130.000 steps
 **Training Configuration:**
 ```json
 "exp_name": "F5TTS_Base",
 "learning_rate": 1e-05,
+"batch_size_per_gpu": 2000,
 "batch_size_type": "frame",
 "max_samples": 64,
 "grad_accumulation_steps": 1,