metadata
tags:
- RVC
- RVC v2
- E-Celeb
- English
- RVMPE-GPU
- RVMPE
- 250Epochs
Introduction:
This repository contains high quality voice models that aim to replicate the voices of celebrities, influencers, and other famous people. These models can be freely used within Text To Speech (TTS) software, voice changers, or Audio to Audio software.
Datasets:
All of the datasets used to train these models are:
1. Of at least 20-25 minutes long and are collected from online videos, audios of interviews, blogs, and more of said speaker (Mostly interviews as its way easier to collect, edit, and polish).
2. These datasets are edited to contain the best high-quality audio of the speaker's voice with no background noise, music, silence, or any artifacts.
3.The sample rate of all of these datasets are 48k hz with the training using the 48k hz.
4. For the dataset recording and extraction process sometimes it may not be 100% perfect due to background noise or music interfering and in some cases
I may not even reach to the 20-25 minute mark since there may be very little or no data available, as such I also reduce the number of epochs to 200 to
prevent overtraining and achieve the highest quality with minimal dataset length.
Training:
All of the Voice Models are trained using:
1. The algorithm: RVMPE_GPU.
2. RVC-V2 Framework.
3. With pitch guidance.
4. 250-300 Total Epochs with the minimum steps reaching 1000 to max steps hitting around 6000.
5. Software used: https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI.
6. All of this was done using my one Nvidia RTX 3090 GPU and 12th Gen Intel Core i7-12700K CPU.
Remember:
I don't own any of the content used for the dataset creation as well as the voice model training. As such I am not responsible for any misuse or abuse of any of this content. All of this was produced for educational purposes as well as for personal usage not malicious intent. Use these voice models at your risk and enjoy!