metadata
language: en
tags:
- text-to-speech
- StyleTTS2
- speech-synthesis
license: mit
pipeline_tag: text-to-speech
StyleTTS2 Fine-tuned Model
This model is a fine-tuned version of StyleTTS2, containing all necessary components for inference.
Model Details
- Base Model: StyleTTS2-LibriTTS
- Architecture: StyleTTS2
- Task: Text-to-Speech
- Last Checkpoint: epoch_2nd_00003.pth
Training Details
- Total Epochs: 4
- Completed Epochs: 3
- Total Iterations: 310
- Batch Size: 2
- Max Length: 120
- Learning Rate: 0.0001
- Final Validation Loss: 0.416427
Model Components
The repository includes all necessary components for inference:
Main Model Components:
- bert.pth
- bert_encoder.pth
- predictor.pth
- decoder.pth
- text_encoder.pth
- predictor_encoder.pth
- style_encoder.pth
- diffusion.pth
- text_aligner.pth
- pitch_extractor.pth
- mpd.pth
- msd.pth
- wd.pth
Utility Components:
- ASR (Automatic Speech Recognition)
- epoch_00080.pth
- config.yml
- models.py
- layers.py
- JDC (F0 Prediction)
- bst.t7
- model.py
- PLBERT
- step_1000000.t7
- config.yml
- util.py
Additional Files:
- text_utils.py: Text preprocessing utilities
- models.py: Model architecture definitions
- utils.py: Utility functions
- config.yml: Model configuration
- config.json: Detailed configuration and training metrics
Training Metrics
Training metrics visualization is available in training_metrics.png
Directory Structure
βββ Utils/ β βββ ASR/ β βββ JDC/ β βββ PLBERT/ βββ model_components/ βββ configs/
Usage Instructions
- Load the model using the provided config.yml
- Ensure all utility components (ASR, JDC, PLBERT) are in their respective directories
- Use text_utils.py for text preprocessing
- Follow the inference example in the StyleTTS2 documentation