Spaces:

atsushieee
/

sovits-test

Running

App Files Files Community

atsushieee commited on Dec 8, 2024

Commit

9791162

1 Parent(s): 827297f

Upload folder using huggingface_hub

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.DS_Store +0 -0
.gitattributes +2 -0
.python-version +1 -0
LICENSE +21 -0
README.md +493 -5
README_ZH.md +418 -0
__pycache__/svc_inference.cpython-310.pyc +0 -0
app.py +444 -0
colab.ipynb +374 -0
configs/base.yaml +72 -0
configs/singers/singer0001.npy +3 -0
configs/singers/singer0002.npy +3 -0
configs/singers/singer0003.npy +3 -0
configs/singers/singer0004.npy +3 -0
configs/singers/singer0005.npy +3 -0
configs/singers/singer0006.npy +3 -0
configs/singers/singer0007.npy +3 -0
configs/singers/singer0008.npy +3 -0
configs/singers/singer0009.npy +3 -0
configs/singers/singer0010.npy +3 -0
configs/singers/singer0011.npy +3 -0
configs/singers/singer0012.npy +3 -0
configs/singers/singer0013.npy +3 -0
configs/singers/singer0014.npy +3 -0
configs/singers/singer0015.npy +3 -0
configs/singers/singer0016.npy +3 -0
configs/singers/singer0017.npy +3 -0
configs/singers/singer0018.npy +3 -0
configs/singers/singer0019.npy +3 -0
configs/singers/singer0020.npy +3 -0
configs/singers/singer0021.npy +3 -0
configs/singers/singer0022.npy +3 -0
configs/singers/singer0023.npy +3 -0
configs/singers/singer0024.npy +3 -0
configs/singers/singer0025.npy +3 -0
configs/singers/singer0026.npy +3 -0
configs/singers/singer0027.npy +3 -0
configs/singers/singer0028.npy +3 -0
configs/singers/singer0029.npy +3 -0
configs/singers/singer0030.npy +3 -0
configs/singers/singer0031.npy +3 -0
configs/singers/singer0032.npy +3 -0
configs/singers/singer0033.npy +3 -0
configs/singers/singer0034.npy +3 -0
configs/singers/singer0035.npy +3 -0
configs/singers/singer0036.npy +3 -0
configs/singers/singer0037.npy +3 -0
configs/singers/singer0038.npy +3 -0
configs/singers/singer0039.npy +3 -0
configs/singers/singer0040.npy +3 -0

.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

.gitattributes CHANGED Viewed

@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+test.wav filter=lfs diff=lfs merge=lfs -text
+vad/assets/silero_vad.jit filter=lfs diff=lfs merge=lfs -text

.python-version ADDED Viewed

	@@ -0,0 +1 @@


1	+ 3.10.9

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2023 PlayVoice
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md CHANGED Viewed

@@ -1,12 +1,500 @@
 ---
-title: Sovits Test
-emoji: 📚
 colorFrom: blue
 colorTo: purple
 sdk: gradio
-sdk_version: 5.8.0
-app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Whisper Vits SVC
+emoji: 🎵
+python_version: 3.10.12
 colorFrom: blue
 colorTo: purple
 sdk: gradio
+sdk_version: 5.7.1
+app_file: main.py
 pinned: false
+license: mit
 ---
+<div align="center">
+<h1> Variational Inference with adversarial learning for end-to-end Singing Voice Conversion based on VITS </h1>
+[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/maxmax20160403/sovits5.0)
+<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/PlayVoice/so-vits-svc-5.0">
+<img alt="GitHub forks" src="https://img.shields.io/github/forks/PlayVoice/so-vits-svc-5.0">
+<img alt="GitHub issues" src="https://img.shields.io/github/issues/PlayVoice/so-vits-svc-5.0">
+<img alt="GitHub" src="https://img.shields.io/github/license/PlayVoice/so-vits-svc-5.0">
+[中文文档](./README_ZH.md)
+The tree [bigvgan-mix-v2](https://github.com/PlayVoice/whisper-vits-svc/tree/bigvgan-mix-v2) has good audio quality
+The tree [RoFormer-HiFTNet](https://github.com/PlayVoice/whisper-vits-svc/tree/RoFormer-HiFTNet) has fast infer speed
+No More Upgrade
+</div>
+- This project targets deep learning beginners, basic knowledge of Python and PyTorch are the prerequisites for this project;
+- This project aims to help deep learning beginners get rid of boring pure theoretical learning, and master the basic knowledge of deep learning by combining it with practices;
+- This project does not support real-time voice converting; (need to replace whisper if real-time voice converting is what you are looking for)
+- This project will not develop one-click packages for other purposes;
+![vits-5.0-frame](https://github.com/PlayVoice/so-vits-svc-5.0/assets/16432329/3854b281-8f97-4016-875b-6eb663c92466)
+- A minimum VRAM requirement of 6GB for training
+- Support for multiple speakers
+- Create unique speakers through speaker mixing
+- It can even convert voices with light accompaniment
+- You can edit F0 using Excel
+https://github.com/PlayVoice/so-vits-svc-5.0/assets/16432329/6a09805e-ab93-47fe-9a14-9cbc1e0e7c3a
+Powered by [@ShadowVap](https://space.bilibili.com/491283091)
+## Model properties
+| Feature | From | Status | Function |
+| :--- | :--- | :--- | :--- |
+| whisper | OpenAI | ✅ | strong noise immunity |
+| bigvgan  | NVIDA | ✅ | alias and snake | The formant is clearer and the sound quality is obviously improved |
+| natural speech | Microsoft | ✅ | reduce mispronunciation |
+| neural source-filter | Xin Wang | ✅ | solve the problem of audio F0 discontinuity |
+| pitch quantization | Xin Wang | ✅ | quantize the F0 for embedding |
+| speaker encoder | Google | ✅ | Timbre Encoding and Clustering |
+| GRL for speaker | Ubisoft |✅ | Preventing Encoder Leakage Timbre |
+| SNAC |  Samsung | ✅ | One Shot Clone of VITS |
+| SCLN |  Microsoft | ✅ | Improve Clone |
+| Diffusion |  HuaWei | ✅ | Improve sound quality |
+| PPG perturbation | this project | ✅ | Improved noise immunity and de-timbre |
+| HuBERT perturbation | this project | ✅ | Improved noise immunity and de-timbre |
+| VAE perturbation | this project | ✅ | Improve sound quality |
+| MIX encoder | this project | ✅ | Improve conversion stability |
+| USP infer | this project | ✅ | Improve conversion stability |
+| HiFTNet | Columbia University | ✅ | NSF-iSTFTNet for speed up |
+| RoFormer | Zhuiyi Technology | ✅ | Rotary Positional Embeddings |
+due to the use of data perturbation, it takes longer to train than other projects.
+**USP : Unvoice and Silence with Pitch when infer**
+![vits_svc_usp](https://github.com/PlayVoice/so-vits-svc-5.0/assets/16432329/ba733b48-8a89-4612-83e0-a0745587d150)
+## Why mix
+![mix_frame](https://github.com/PlayVoice/whisper-vits-svc/assets/16432329/3ffa1be0-1a21-4752-96b5-6220f98f2313)
+## Plug-In-Diffusion
+![plug-in-diffusion](https://github.com/PlayVoice/so-vits-svc-5.0/assets/16432329/54a61c90-a97b-404d-9cc9-a2151b2db28f)
+## Setup Environment
+1. Install [PyTorch](https://pytorch.org/get-started/locally/).
+2. Install project dependencies
+    ```shell
+    pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements.txt
+    ```
+    **Note: whisper is already built-in, do not install it again otherwise it will cuase conflict and error**
+3. Download the Timbre Encoder: [Speaker-Encoder by @mueller91](https://drive.google.com/drive/folders/15oeBYf6Qn1edONkVLXe82MzdIi3O_9m3), put `best_model.pth.tar`  into `speaker_pretrain/`.
+4. Download whisper model [whisper-large-v2](https://openaipublic.azureedge.net/main/whisper/models/81f7c96c852ee8fc832187b0132e569d6c3065a3252ed18e56effd0b6a73e524/large-v2.pt). Make sure to download `large-v2.pt`，put it into `whisper_pretrain/`.
+5. Download [hubert_soft model](https://github.com/bshall/hubert/releases/tag/v0.1)，put `hubert-soft-0d54a1f4.pt` into `hubert_pretrain/`.
+6. Download pitch extractor [crepe full](https://github.com/maxrmorrison/torchcrepe/tree/master/torchcrepe/assets)，put `full.pth` into `crepe/assets`.
+   **Note: crepe full.pth is 84.9 MB, not 6kb**
+7. Download pretrain model [sovits5.0.pretrain.pth](https://github.com/PlayVoice/so-vits-svc-5.0/releases/tag/5.0/), and put it into `vits_pretrain/`.
+    ```shell
+    python svc_inference.py --config configs/base.yaml --model ./vits_pretrain/sovits5.0.pretrain.pth --spk ./configs/singers/singer0001.npy --wave test.wav
+    ```
+## Dataset preparation
+Necessary pre-processing:
+1. Separate voice and accompaniment with [UVR](https://github.com/Anjok07/ultimatevocalremovergui) (skip if no accompaniment)
+2. Cut audio input to shorter length with [slicer](https://github.com/flutydeer/audio-slicer), whisper takes input less than 30 seconds.
+3. Manually check generated audio input, remove inputs shorter than 2 seconds or with obivous noise.
+4. Adjust loudness if necessary, recommend Adobe Audiiton.
+5. Put the dataset into the `dataset_raw` directory following the structure below.
+```
+dataset_raw
+├───speaker0
+│   ├───000001.wav
+│   ├───...
+│   └───000xxx.wav
+└───speaker1
+    ├───000001.wav
+    ├───...
+    └───000xxx.wav
+```
+## Data preprocessing
+```shell
+python svc_preprocessing.py -t 2
+```
+`-t`: threading, max number should not exceed CPU core count, usually 2 is enough.
+After preprocessing you will get an output with following structure.
+```
+data_svc/
+└── waves-16k
+│    └── speaker0
+│    │      ├── 000001.wav
+│    │      └── 000xxx.wav
+│    └── speaker1
+│           ├── 000001.wav
+│           └── 000xxx.wav
+└── waves-32k
+│    └── speaker0
+│    │      ├── 000001.wav
+│    │      └── 000xxx.wav
+│    └── speaker1
+│           ├── 000001.wav
+│           └── 000xxx.wav
+└── pitch
+│    └── speaker0
+│    │      ├── 000001.pit.npy
+│    │      └── 000xxx.pit.npy
+│    └── speaker1
+│           ├── 000001.pit.npy
+│           └── 000xxx.pit.npy
+└── hubert
+│    └── speaker0
+│    │      ├── 000001.vec.npy
+│    │      └── 000xxx.vec.npy
+│    └── speaker1
+│           ├── 000001.vec.npy
+│           └── 000xxx.vec.npy
+└── whisper
+│    └── speaker0
+│    │      ├── 000001.ppg.npy
+│    │      └── 000xxx.ppg.npy
+│    └── speaker1
+│           ├── 000001.ppg.npy
+│           └── 000xxx.ppg.npy
+└── speaker
+│    └── speaker0
+│    │      ├── 000001.spk.npy
+│    │      └── 000xxx.spk.npy
+│    └── speaker1
+│           ├── 000001.spk.npy
+│           └── 000xxx.spk.npy
+└── singer
+│   ├── speaker0.spk.npy
+│   └── speaker1.spk.npy
+|
+└── indexes
+    ├── speaker0
+    │   ├── some_prefix_hubert.index
+    │   └── some_prefix_whisper.index
+    └── speaker1
+        ├── hubert.index
+        └── whisper.index
+```
+1.  Re-sampling
+    - Generate audio with a sampling rate of 16000Hz in `./data_svc/waves-16k`
+    ```
+    python prepare/preprocess_a.py -w ./dataset_raw -o ./data_svc/waves-16k -s 16000
+    ```
+    - Generate audio with a sampling rate of 32000Hz in `./data_svc/waves-32k`
+    ```
+    python prepare/preprocess_a.py -w ./dataset_raw -o ./data_svc/waves-32k -s 32000
+    ```
+2. Use 16K audio to extract pitch
+    ```
+    python prepare/preprocess_crepe.py -w data_svc/waves-16k/ -p data_svc/pitch
+    ```
+3. Use 16K audio to extract ppg
+    ```
+    python prepare/preprocess_ppg.py -w data_svc/waves-16k/ -p data_svc/whisper
+    ```
+4. Use 16K audio to extract hubert
+    ```
+    python prepare/preprocess_hubert.py -w data_svc/waves-16k/ -v data_svc/hubert
+    ```
+5. Use 16k audio to extract timbre code
+    ```
+    python prepare/preprocess_speaker.py data_svc/waves-16k/ data_svc/speaker
+    ```
+6. Extract the average value of the timbre code for inference; it can also replace a single audio timbre in generating the training index, and use it as the unified timbre of the speaker for training
+    ```
+    python prepare/preprocess_speaker_ave.py data_svc/speaker/ data_svc/singer
+    ```
+7. Use 32k audio to extract the linear spectrum
+    ```
+    python prepare/preprocess_spec.py -w data_svc/waves-32k/ -s data_svc/specs
+    ```
+8. Use 32k audio to generate training index
+    ```
+    python prepare/preprocess_train.py
+    ```
+11. Training file debugging
+    ```
+    python prepare/preprocess_zzz.py
+    ```
+## Train
+1. If fine-tuning is based on the pre-trained model, you need to download the pre-trained model: [sovits5.0.pretrain.pth](https://github.com/PlayVoice/so-vits-svc-5.0/releases/tag/5.0). Put pretrained model under project root, change this line
+    ```
+    pretrain: "./vits_pretrain/sovits5.0.pretrain.pth"
+    ```
+    in `configs/base.yaml`，and adjust the learning rate appropriately, eg 5e-5.
+   `batch_size`: for GPU with 6G VRAM, 6 is the recommended value, 8 will work but step speed will be much slower.
+2. Start training
+   ```
+   python svc_trainer.py -c configs/base.yaml -n sovits5.0
+   ```
+3. Resume training
+   ```
+   python svc_trainer.py -c configs/base.yaml -n sovits5.0 -p chkpt/sovits5.0/sovits5.0_***.pt
+   ```
+4. Log visualization
+   ```
+   tensorboard --logdir logs/
+   ```
+![sovits5 0_base](https://github.com/PlayVoice/so-vits-svc-5.0/assets/16432329/1628e775-5888-4eac-b173-a28dca978faa)
+![sovits_spec](https://github.com/PlayVoice/so-vits-svc-5.0/assets/16432329/c4223cf3-b4a0-4325-bec0-6d46d195a1fc)
+## Inference
+1. Export inference model: text encoder, Flow network, Decoder network
+   ```
+   python svc_export.py --config configs/base.yaml --checkpoint_path chkpt/sovits5.0/***.pt
+   ```
+2. Inference
+   - if there is no need to adjust `f0`, just run the following command.
+   ```
+   python svc_inference.py --config configs/base.yaml --model sovits5.0.pth --spk ./data_svc/singer/your_singer.spk.npy --wave test.wav --shift 0
+   ```
+   - if `f0` will be adjusted manually, follow the steps:
+     1. use whisper to extract content encoding, generate `test.vec.npy`.
+       ```
+       python whisper/inference.py -w test.wav -p test.ppg.npy
+       ```
+     2. use hubert to extract content vector, without using one-click reasoning, in order to reduce GPU memory usage
+       ```
+       python hubert/inference.py -w test.wav -v test.vec.npy
+       ```
+     3. extract the F0 parameter to the csv text format, open the csv file in Excel, and manually modify the wrong F0 according to Audition or SonicVisualiser
+       ```
+       python pitch/inference.py -w test.wav -p test.csv
+       ```
+     4. final inference
+       ```
+       python svc_inference.py --config configs/base.yaml --model sovits5.0.pth --spk ./data_svc/singer/your_singer.spk.npy --wave test.wav --ppg test.ppg.npy --vec test.vec.npy --pit test.csv --shift 0
+       ```
+3. Notes
+    - when `--ppg` is specified, when the same audio is reasoned multiple times, it can avoid repeated extraction of audio content codes; if it is not specified, it will be automatically extracted;
+    - when `--vec` is specified, when the same audio is reasoned multiple times, it can avoid repeated extraction of audio content codes; if it is not specified, it will be automatically extracted;
+    - when `--pit` is specified, the manually tuned F0 parameter can be loaded; if not specified, it will be automatically extracted;
+    - generate files in the current directory:svc_out.wav
+4. Arguments ref
+    | args |--config | --model | --spk | --wave | --ppg | --vec | --pit | --shift |
+    | :---:  | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+    | name | config path | model path | speaker | wave input | wave ppg | wave hubert | wave pitch | pitch shift |
+5. post by vad
+```
+python svc_inference_post.py --ref test.wav --svc svc_out.wav --out svc_out_post.wav
+```
+## Train Feature Retrieval Index (Optional)
+To increase the stability of the generated timbre, you can use the method described in the
+[Retrieval-based-Voice-Conversion](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/blob/main/docs/en/README.en.md)
+repository. This method consists of 2 steps:
+1. Training the retrieval index on hubert and whisper features
+    Run training with default settings:
+    ```
+    python svc_train_retrieval.py
+    ```
+    If the number of vectors is more than 200_000 they will be compressed to 10_000 using the MiniBatchKMeans algorithm.
+    You can change these settings using command line options:
+    ```
+    usage: crate faiss indexes for feature retrieval [-h] [--debug] [--prefix PREFIX] [--speakers SPEAKERS [SPEAKERS ...]] [--compress-features-after COMPRESS_FEATURES_AFTER]
+                                                     [--n-clusters N_CLUSTERS] [--n-parallel N_PARALLEL]
+    options:
+      -h, --help            show this help message and exit
+      --debug
+      --prefix PREFIX       add prefix to index filename
+      --speakers SPEAKERS [SPEAKERS ...]
+                            speaker names to create an index. By default all speakers are from data_svc
+      --compress-features-after COMPRESS_FEATURES_AFTER
+                            If the number of features is greater than the value compress feature vectors using MiniBatchKMeans.
+      --n-clusters N_CLUSTERS
+                            Number of centroids to which features will be compressed
+      --n-parallel N_PARALLEL
+                            Nuber of parallel job of MinibatchKmeans. Default is cpus-1
+    ```
+    Compression of training vectors can speed up index inference, but reduces the quality of the retrieve.
+    Use vector count compression if you really have a lot of them.
+    The resulting indexes will be stored in the "indexes" folder as:
+    ```
+    data_svc
+    ...
+    └── indexes
+        ├── speaker0
+        │   ├── some_prefix_hubert.index
+        │   └── some_prefix_whisper.index
+        └── speaker1
+            ├── hubert.index
+            └── whisper.index
+    ```
+2. At the inference stage adding the n closest features in a certain proportion of the vits model
+    Enable Feature Retrieval with settings:
+    ```
+    python svc_inference.py --config configs/base.yaml --model sovits5.0.pth --spk ./data_svc/singer/your_singer.spk.npy --wave test.wav --shift 0 \
+    --enable-retrieval \
+    --retrieval-ratio 0.5 \
+    --n-retrieval-vectors 3
+    ```
+    For a better retrieval effect, you can try to cycle through different parameters: `--retrieval-ratio` and `--n-retrieval-vectors`
+    If you have multiple sets of indexes, you can specify a specific set via the parameter: `--retrieval-index-prefix`
+    You can explicitly specify the paths to the hubert and whisper indexes using the parameters: `--hubert-index-path` and `--whisper-index-path`
+## Create singer
+named by pure coincidence：average -> ave -> eva，eve(eva) represents conception and reproduction
+```
+python svc_eva.py
+```
+```python
+eva_conf = {
+    './configs/singers/singer0022.npy': 0,
+    './configs/singers/singer0030.npy': 0,
+    './configs/singers/singer0047.npy': 0.5,
+    './configs/singers/singer0051.npy': 0.5,
+}
+```
+the generated singer file will be `eva.spk.npy`.
+## Data set
+| Name | URL |
+| :--- | :--- |
+|KiSing         |http://shijt.site/index.php/2021/05/16/kising-the-first-open-source-mandarin-singing-voice-synthesis-corpus/|
+|PopCS          |https://github.com/MoonInTheRiver/DiffSinger/blob/master/resources/apply_form.md|
+|opencpop       |https://wenet.org.cn/opencpop/download/|
+|Multi-Singer   |https://github.com/Multi-Singer/Multi-Singer.github.io|
+|M4Singer       |https://github.com/M4Singer/M4Singer/blob/master/apply_form.md|
+|CSD            |https://zenodo.org/record/4785016#.YxqrTbaOMU4|
+|KSS            |https://www.kaggle.com/datasets/bryanpark/korean-single-speaker-speech-dataset|
+|JVS MuSic      |https://sites.google.com/site/shinnosuketakamichi/research-topics/jvs_music|
+|PJS            |https://sites.google.com/site/shinnosuketakamichi/research-topics/pjs_corpus|
+|JUST Song      |https://sites.google.com/site/shinnosuketakamichi/publication/jsut-song|
+|MUSDB18        |https://sigsep.github.io/datasets/musdb.html#musdb18-compressed-stems|
+|DSD100         |https://sigsep.github.io/datasets/dsd100.html|
+|Aishell-3      |http://www.aishelltech.com/aishell_3|
+|VCTK           |https://datashare.ed.ac.uk/handle/10283/2651|
+|Korean Songs   |http://urisori.co.kr/urisori-en/doku.php/|
+## Code sources and references
+https://github.com/facebookresearch/speech-resynthesis [paper](https://arxiv.org/abs/2104.00355)
+https://github.com/jaywalnut310/vits [paper](https://arxiv.org/abs/2106.06103)
+https://github.com/openai/whisper/ [paper](https://arxiv.org/abs/2212.04356)
+https://github.com/NVIDIA/BigVGAN [paper](https://arxiv.org/abs/2206.04658)
+https://github.com/mindslab-ai/univnet [paper](https://arxiv.org/abs/2106.07889)
+https://github.com/nii-yamagishilab/project-NN-Pytorch-scripts/tree/master/project/01-nsf
+https://github.com/huawei-noah/Speech-Backbones/tree/main/Grad-TTS
+https://github.com/brentspell/hifi-gan-bwe
+https://github.com/mozilla/TTS
+https://github.com/bshall/soft-vc
+https://github.com/maxrmorrison/torchcrepe
+https://github.com/MoonInTheRiver/DiffSinger
+https://github.com/OlaWod/FreeVC [paper](https://arxiv.org/abs/2210.15418)
+https://github.com/yl4579/HiFTNet [paper](https://arxiv.org/abs/2309.09493)
+[Autoregressive neural f0 model for statistical parametric speech synthesis](https://web.archive.org/web/20210718024752id_/https://ieeexplore.ieee.org/ielx7/6570655/8356719/08341752.pdf)
+[One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization](https://arxiv.org/abs/1904.05742)
+[SNAC : Speaker-normalized Affine Coupling Layer in Flow-based Architecture for Zero-Shot Multi-Speaker Text-to-Speech](https://github.com/hcy71o/SNAC)
+[Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers](https://arxiv.org/abs/2211.00585)
+[AdaSpeech: Adaptive Text to Speech for Custom Voice](https://arxiv.org/pdf/2103.00993.pdf)
+[AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation](https://arxiv.org/pdf/2206.00208.pdf)
+[Cross-Speaker Prosody Transfer on Any Text for Expressive Speech Synthesis](https://github.com/ubisoft/ubisoft-laforge-daft-exprt)
+[Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings](https://arxiv.org/abs/2305.05401)
+[Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation Based Voice Conversion](https://arxiv.org/pdf/2305.09167.pdf)
+[Multilingual Speech Synthesis and Cross-Language Voice Cloning: GRL](https://arxiv.org/abs/1907.04448)
+[RoFormer: Enhanced Transformer with rotary position embedding](https://arxiv.org/abs/2104.09864)
+## Method of Preventing Timbre Leakage Based on Data Perturbation
+https://github.com/auspicious3000/contentvec/blob/main/contentvec/data/audio/audio_utils_1.py
+https://github.com/revsic/torch-nansy/blob/main/utils/augment/praat.py
+https://github.com/revsic/torch-nansy/blob/main/utils/augment/peq.py
+https://github.com/biggytruck/SpeechSplit2/blob/main/utils.py
+https://github.com/OlaWod/FreeVC/blob/main/preprocess_sr.py
+## Contributors
+<a href="https://github.com/PlayVoice/so-vits-svc/graphs/contributors">
+  <img src="https://contrib.rocks/image?repo=PlayVoice/so-vits-svc" />
+</a>
+## Thanks to
+https://github.com/Francis-Komizu/Sovits
+## Relevant Projects
+- [LoRA-SVC](https://github.com/PlayVoice/lora-svc): decoder only svc
+- [Grad-SVC](https://github.com/PlayVoice/Grad-SVC): diffusion based svc
+## Original evidence
+2022.04.12 https://mp.weixin.qq.com/s/autNBYCsG4_SvWt2-Ll_zA
+2022.04.22 https://github.com/PlayVoice/VI-SVS
+2022.07.26 https://mp.weixin.qq.com/s/qC4TJy-4EVdbpvK2cQb1TA
+2022.09.08 https://github.com/PlayVoice/VI-SVC
+## Be copied by svc-develop-team/so-vits-svc
+![coarse_f0_1](https://github.com/PlayVoice/so-vits-svc-5.0/assets/16432329/e2f5e5d3-d169-42c1-953f-4e1648b6da37)

README_ZH.md ADDED Viewed

	@@ -0,0 +1,418 @@

+<div align="center">
+<h1> Variational Inference with adversarial learning for end-to-end Singing Voice Conversion based on VITS </h1>
+[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/maxmax20160403/sovits5.0)
+<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/PlayVoice/so-vits-svc-5.0">
+<img alt="GitHub forks" src="https://img.shields.io/github/forks/PlayVoice/so-vits-svc-5.0">
+<img alt="GitHub issues" src="https://img.shields.io/github/issues/PlayVoice/so-vits-svc-5.0">
+<img alt="GitHub" src="https://img.shields.io/github/license/PlayVoice/so-vits-svc-5.0">
+</div>
+### 本项目使用简洁明了的代码结构，用于深度学习技术的研究
+### 基于学习的目的，本项目并不追求效果极限、而更多的为学生笔记本考虑，采用了低配置参数、最终预训练模型为202M（包括生成器和判别器，且为float32模型），远远小于同类项目模型大小
+### 如果你寻找的是直接可用的项目，本项目并不适合你
+- 本项目的目标群体是：深度学习初学者，具备Python和PyTorch的基本操作是使用本项目的前置条件；
+- 本项目旨在帮助深度学习初学者，摆脱枯燥的纯理论学习，通过与实践结合，熟练掌握深度学习基本知识；
+- 本项目不支持实时变声；（支持需要换掉whisper）
+- 本项目不会开发用于其他用途的一键包
+### 代码详解课程
+- 1-整体框架 https://www.bilibili.com/video/BV1Tj411e7pQ
+- 2-数据准备和预处理 https://www.bilibili.com/video/BV1uj411v7zW
+- 3-先验后验编码器 https://www.bilibili.com/video/BV1Be411Q7r5
+- 4-decoder部分 https://www.bilibili.com/video/BV19u4y1b73U
+- 5-蛇形激活函数 https://www.bilibili.com/video/BV1HN4y1D7AR
+- 6-Flow部分 https://www.bilibili.com/video/BV1ju411F7Fs
+- 7-训练及损失函数部分 https://www.bilibili.com/video/BV1qw411W73B
+- 8-训练推理以及基频矫正 https://www.bilibili.com/video/BV1eb4y1u7ER
+![vits-5.0-frame](https://github.com/PlayVoice/so-vits-svc-5.0/assets/16432329/3854b281-8f97-4016-875b-6eb663c92466)
+- 【无 泄漏】支持多发音人
+- 【捏 音色】创造独有发音人
+- 【带 伴奏】也能进行转换，轻度伴奏
+- 【用 Excel】进行原始调教，纯手工
+https://github.com/PlayVoice/so-vits-svc-5.0/assets/16432329/63858332-cc0d-40e1-a216-6fe8bf638f7c
+Powered by [@ShadowVap](https://space.bilibili.com/491283091)
+## 模型特点：
+| Feature | From | Status | Function |
+| :--- | :--- | :--- | :--- |
+| whisper | OpenAI | ✅ | 强大的抗噪能力 |
+| bigvgan  | NVIDA | ✅ | 抗锯齿与蛇形激活，共振峰更清晰，提升音质明显 |
+| natural speech | Microsoft | ✅ | 减少发音错误 |
+| neural source-filter | NII | ✅ | 解决断音问题 |
+| speaker encoder | Google | ✅ | 音色编码与聚类 |
+| GRL for speaker | Ubisoft |✅ | 对抗去音色 |
+| SNAC |  Samsung | ✅ | VITS 一句话克隆 |
+| SCLN |  Microsoft | ✅ | 改善克隆 |
+| PPG perturbation | 本项目 | ✅ | 提升抗噪性和去音色 |
+| HuBERT perturbation | 本项目 | ✅ | 提升抗噪性和去音色 |
+| VAE perturbation | 本项目 | ✅ | 提升音质 |
+| Mix encoder | 本项目 | ✅ | 提升转换稳定性 |
+| USP 推理 | 本项目 | ✅ | 提升转换稳定性 |
+**USP : 即使unvoice和silence在推理的时候，也有Pitch，这个Pitch平滑链接voice段**
+![vits_svc_usp](https://github.com/PlayVoice/so-vits-svc-5.0/assets/16432329/ba733b48-8a89-4612-83e0-a0745587d150)
+## 为什么要mix
+![mix_frame](https://github.com/PlayVoice/whisper-vits-svc/assets/16432329/3ffa1be0-1a21-4752-96b5-6220f98f2313)
+## 安装环境
+1. 安装[PyTorch](https://pytorch.org/get-started/locally/)
+2.  安装项目依赖
+    ```
+    pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements.txt
+    ```
+    **注意：不能额外安装whisper，否则会和代码内置whisper冲突**
+3.  下载[音色编码器](https://drive.google.com/drive/folders/15oeBYf6Qn1edONkVLXe82MzdIi3O_9m3), 把`best_model.pth.tar`放到`speaker_pretrain/`里面 （**不要解压**）
+4.  下载[whisper-large-v2模型](https://openaipublic.azureedge.net/main/whisper/models/81f7c96c852ee8fc832187b0132e569d6c3065a3252ed18e56effd0b6a73e524/large-v2.pt)，把`large-v2.pt`放到`whisper_pretrain/`里面
+5.  下载[hubert_soft模型](https://github.com/bshall/hubert/releases/tag/v0.1)，把`hubert-soft-0d54a1f4.pt`放到`hubert_pretrain/`里面
+6.  下载音高提取模型[crepe full](https://github.com/maxrmorrison/torchcrepe/tree/master/torchcrepe/assets)，把`full.pth`放到`crepe/assets`里面
+    **注意：full.pth为84.9M，请确认文件大小无误**
+7.  下载[sovits5.0.pretrain.pth](https://github.com/PlayVoice/so-vits-svc-5.0/releases/tag/5.0/), 把它放到`vits_pretrain/`里面，推理测试
+    > python svc_inference.py --config configs/base.yaml --model ./vits_pretrain/sovits5.0.pretrain.pth --spk ./configs/singers/singer0001.npy --wave test.wav
+## 数据集准备
+1. 人声分离，如果数据集没有BGM直接跳过此步骤（推荐使用[UVR](https://github.com/Anjok07/ultimatevocalremovergui)中的3_HP-Vocal-UVR模型或者htdemucs_ft模型抠出数据集中的人声）
+2. 用[slicer](https://github.com/flutydeer/audio-slicer)剪切音频，whisper要求为小于30秒（建议丢弃不足2秒的音频，短音频大多没有音素，有可能会影响训练效果）
+3. 手动筛选经过第1步和第2步处理过的音频，裁剪或者丢弃杂音明显的音频，如果数据集没有BGM直接跳过此步骤
+4. 用Adobe Audition进行响度平衡处理
+5. 按下面文件结构，将数据集放入dataset_raw目录
+```shell
+dataset_raw
+├───speaker0
+│   ├───000001.wav
+│   ├───...
+│   └───000xxx.wav
+└───speaker1
+    ├───000001.wav
+    ├───...
+    └───000xxx.wav
+```
+## 数据预处理
+```shell
+python svc_preprocessing.py -t 2
+```
+-t：指定线程数，必须是正整数且不得超过CPU总核心数，一般写2就可以了
+预处理完成后文件夹结构如下面所示
+```shell
+data_svc/
+└── waves-16k
+│    └── speaker0
+│    │      ├── 000001.wav
+│    │      └── 000xxx.wav
+│    └── speaker1
+│           ├── 000001.wav
+│           └── 000xxx.wav
+└── waves-32k
+│    └── speaker0
+│    │      ├── 000001.wav
+│    │      └── 000xxx.wav
+│    └── speaker1
+│           ├── 000001.wav
+│           └── 000xxx.wav
+└── pitch
+│    └── speaker0
+│    │      ├── 000001.pit.npy
+│    │      └── 000xxx.pit.npy
+│    └── speaker1
+│           ├── 000001.pit.npy
+│           └── 000xxx.pit.npy
+└── hubert
+│    └── speaker0
+│    │      ├── 000001.vec.npy
+│    │      └── 000xxx.vec.npy
+│    └── speaker1
+│           ├── 000001.vec.npy
+│           └── 000xxx.vec.npy
+└── whisper
+│    └── speaker0
+│    │      ├── 000001.ppg.npy
+│    │      └── 000xxx.ppg.npy
+│    └── speaker1
+│           ├── 000001.ppg.npy
+│           └── 000xxx.ppg.npy
+└── speaker
+│    └── speaker0
+│    │      ├── 000001.spk.npy
+│    │      └── 000xxx.spk.npy
+│    └── speaker1
+│           ├── 000001.spk.npy
+│           └── 000xxx.spk.npy
+└── singer
+    ├── speaker0.spk.npy
+    └── speaker1.spk.npy
+```
+如果您有编程基础，推荐，逐步完成数据处理，也利于学习内部工作原理
+- 1， 重采样
+    生成采样率16000Hz音频, 存储路径为：./data_svc/waves-16k
+    > python prepare/preprocess_a.py -w ./dataset_raw -o ./data_svc/waves-16k -s 16000
+    生成采样率32000Hz音频, 存储路径为：./data_svc/waves-32k
+    > python prepare/preprocess_a.py -w ./dataset_raw -o ./data_svc/waves-32k -s 32000
+- 2， 使用16K音频，提取音高
+    > python prepare/preprocess_crepe.py -w data_svc/waves-16k/ -p data_svc/pitch
+- 3， 使用16k音频，提取内容编码
+    > python prepare/preprocess_ppg.py -w data_svc/waves-16k/ -p data_svc/whisper
+- 4， 使用16k音频，提取内容编码
+    > python prepare/preprocess_hubert.py -w data_svc/waves-16k/ -v data_svc/hubert
+- 5， 使用16k音频，提取音色编码
+    > python prepare/preprocess_speaker.py data_svc/waves-16k/ data_svc/speaker
+- 6， 提取音色编码均值；用于推理，也可作为发音人统一音色用于生成训练索引（数据音色变化不大的情况下）
+    > python prepare/preprocess_speaker_ave.py data_svc/speaker/ data_svc/singer
+- 7， 使用32k音频，提取线性谱
+    > python prepare/preprocess_spec.py -w data_svc/waves-32k/ -s data_svc/specs
+- 8， 使用32k音频，生成训练索引
+    > python prepare/preprocess_train.py
+- 9， 训练文件调试
+    > python prepare/preprocess_zzz.py
+## 训练
+0. 参数调整
+  如果基于预训练模型微调，需要下载预训练模型[sovits5.0.pretrain.pth](https://github.com/PlayVoice/so-vits-svc-5.0/releases/tag/5.0)并且放在项目根目录下面<br>
+  并且修改`configs/base.yaml`的参数`pretrain: "./vits_pretrain/sovits5.0.pretrain.pth"`，并适当调小学习率（建议从5e-5开始尝试）<br>
+  **learning_rate & batch_size & accum_step 为三个紧密相关的参数，需要仔细调节**<br>
+  **batch_size 乘以 accum_step 通常等于 16 或 32，对于低显存GPU，可以尝试 batch_size = 4，accum_step = 4**
+1. 开始训练
+   ```
+   python svc_trainer.py -c configs/base.yaml -n sovits5.0
+   ```
+2. 恢复训练
+   ```
+   python svc_trainer.py -c configs/base.yaml -n sovits5.0 -p chkpt/sovits5.0/sovits5.0_***.pt
+   ```
+3. 训练日志可视化
+   ```
+   tensorboard --logdir logs/
+   ```
+![sovits5 0_base](https://github.com/PlayVoice/so-vits-svc-5.0/assets/16432329/1628e775-5888-4eac-b173-a28dca978faa)
+![sovits_spec](https://github.com/PlayVoice/so-vits-svc-5.0/assets/16432329/c4223cf3-b4a0-4325-bec0-6d46d195a1fc)
+## 推理
+1. 导出推理模型：文本编码器，Flow网络，Decoder网络；判别器和后验编码器等只在训练中使用
+   ```
+   python svc_export.py --config configs/base.yaml --checkpoint_path chkpt/sovits5.0/***.pt
+   ```
+2. 推理
+- 如果不想手动调整f0，只需要最终的推理结果，运行下面的命令即可
+  ```
+  python svc_inference.py --config configs/base.yaml --model sovits5.0.pth --spk ./data_svc/singer/修改成对应的名称.npy --wave test.wav --shift 0
+  ```
+- 如果需要手动调整f0，依据下面的流程操作
+  - 使用whisper提取内容编码，生成test.ppg.npy
+    ```
+    python whisper/inference.py -w test.wav -p test.ppg.npy
+    ```
+  - 使用hubert提取内容编码，生成test.vec.npy
+    ```
+    python hubert/inference.py -w test.wav -v test.vec.npy
+    ```
+  - 提取csv文本格式F0参数，用Excel打开csv文件，对照Audition或者SonicVisualiser手动修改错误的F0
+     ```
+     python pitch/inference.py -w test.wav -p test.csv
+     ```
+  - 最终推理
+     ```
+     python svc_inference.py --config configs/base.yaml --model sovits5.0.pth --spk ./data_svc/singer/修改成对应的名称.npy --wave test.wav --ppg test.ppg.npy --vec test.vec.npy --pit test.csv --shift 0
+     ```
+3. 一些注意点
+    当指定--ppg后，多次推理同一个音频时，可以避免重复提取音频内容编码；没有指定，也会自动提取
+    当指定--vec后，多次推理同一个音频时，可以避免重复提取音频内容编码；没有指定，也会自动提取
+    当指定--pit后，可以加载手工调教的F0参数；没有指定，也会自动提取
+    生成文件在当前目录svc_out.wav
+    | args | --config | --model | --spk | --wave | --ppg | --vec | --pit | --shift |
+    | :---:  | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+    | name | 配置文件 | 模型文件 | 音色文件 | 音频文件 | ppg内容 | hubert内容 | 音高内容 | 升降调 |
+4. 去噪后处理
+```
+python svc_inference_post.py --ref test.wav --svc svc_out.wav --out svc_out_post.wav
+```
+## 两种训练模式
+- 分散模式：训练索引中，音色文件使用音频音色
+- 统一模式：训练索引中，音色文件使用发音人音色
+**问题：哪种情况下，哪个模式更好**
+## 模型融合
+```
+python svc_merge.py --model1 模型1.pt --model1 模型2.pt --rate 模型1占比(0~1)
+```
+对不同epoch的模型进行融合，可以获得比较平均的性能、削弱过拟合
+例如：python svc_merge.py --model1 chkpt\sovits5.0\sovits5.0_1045.pt --model2 chkpt\sovits5.0\sovits5.0_1050.pt --rate 0.4
+## 捏音色
+纯属巧合的取名：average -> ave -> eva，夏娃代表者孕育和繁衍
+```
+python svc_eva.py
+```
+```python
+eva_conf = {
+    './configs/singers/singer0022.npy': 0,
+    './configs/singers/singer0030.npy': 0,
+    './configs/singers/singer0047.npy': 0.5,
+    './configs/singers/singer0051.npy': 0.5,
+}
+```
+生成的音色文件为：eva.spk.npy
+## 数据集
+| Name | URL |
+| :--- | :--- |
+|KiSing         |http://shijt.site/index.php/2021/05/16/kising-the-first-open-source-mandarin-singing-voice-synthesis-corpus/|
+|PopCS          |https://github.com/MoonInTheRiver/DiffSinger/blob/master/resources/apply_form.md|
+|opencpop       |https://wenet.org.cn/opencpop/download/|
+|Multi-Singer   |https://github.com/Multi-Singer/Multi-Singer.github.io|
+|M4Singer       |https://github.com/M4Singer/M4Singer/blob/master/apply_form.md|
+|CSD            |https://zenodo.org/record/4785016#.YxqrTbaOMU4|
+|KSS            |https://www.kaggle.com/datasets/bryanpark/korean-single-speaker-speech-dataset|
+|JVS MuSic      |https://sites.google.com/site/shinnosuketakamichi/research-topics/jvs_music|
+|PJS            |https://sites.google.com/site/shinnosuketakamichi/research-topics/pjs_corpus|
+|JUST Song      |https://sites.google.com/site/shinnosuketakamichi/publication/jsut-song|
+|MUSDB18        |https://sigsep.github.io/datasets/musdb.html#musdb18-compressed-stems|
+|DSD100         |https://sigsep.github.io/datasets/dsd100.html|
+|Aishell-3      |http://www.aishelltech.com/aishell_3|
+|VCTK           |https://datashare.ed.ac.uk/handle/10283/2651|
+|Korean Songs   |http://urisori.co.kr/urisori-en/doku.php/|
+## 代码来源和参考文献
+https://github.com/facebookresearch/speech-resynthesis [paper](https://arxiv.org/abs/2104.00355)
+https://github.com/jaywalnut310/vits [paper](https://arxiv.org/abs/2106.06103)
+https://github.com/openai/whisper/ [paper](https://arxiv.org/abs/2212.04356)
+https://github.com/NVIDIA/BigVGAN [paper](https://arxiv.org/abs/2206.04658)
+https://github.com/mindslab-ai/univnet [paper](https://arxiv.org/abs/2106.07889)
+https://github.com/nii-yamagishilab/project-NN-Pytorch-scripts/tree/master/project/01-nsf
+https://github.com/huawei-noah/Speech-Backbones/tree/main/Grad-TTS
+https://github.com/brentspell/hifi-gan-bwe
+https://github.com/mozilla/TTS
+https://github.com/bshall/soft-vc
+https://github.com/maxrmorrison/torchcrepe
+https://github.com/MoonInTheRiver/DiffSinger
+https://github.com/OlaWod/FreeVC [paper](https://arxiv.org/abs/2210.15418)
+https://github.com/yl4579/HiFTNet [paper](https://arxiv.org/abs/2309.09493)
+[One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization](https://arxiv.org/abs/1904.05742)
+[SNAC : Speaker-normalized Affine Coupling Layer in Flow-based Architecture for Zero-Shot Multi-Speaker Text-to-Speech](https://github.com/hcy71o/SNAC)
+[Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers](https://arxiv.org/abs/2211.00585)
+[AdaSpeech: Adaptive Text to Speech for Custom Voice](https://arxiv.org/pdf/2103.00993.pdf)
+[AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation](https://arxiv.org/pdf/2206.00208.pdf)
+[Cross-Speaker Prosody Transfer on Any Text for Expressive Speech Synthesis](https://github.com/ubisoft/ubisoft-laforge-daft-exprt)
+[Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings](https://arxiv.org/abs/2305.05401)
+[Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation Based Voice Conversion](https://arxiv.org/pdf/2305.09167.pdf)
+[Multilingual Speech Synthesis and Cross-Language Voice Cloning: GRL](https://arxiv.org/abs/1907.04448)
+[RoFormer: Enhanced Transformer with rotary position embedding](https://arxiv.org/abs/2104.09864))https://github.com/facebookresearch/speech-resynthesis [paper](https://arxiv.org/abs/2104.00355)
+## 基于数据扰动防止音色泄露的方法
+https://github.com/auspicious3000/contentvec/blob/main/contentvec/data/audio/audio_utils_1.py
+https://github.com/revsic/torch-nansy/blob/main/utils/augment/praat.py
+https://github.com/revsic/torch-nansy/blob/main/utils/augment/peq.py
+https://github.com/biggytruck/SpeechSplit2/blob/main/utils.py
+https://github.com/OlaWod/FreeVC/blob/main/preprocess_sr.py
+## 贡献者
+<a href="https://github.com/PlayVoice/so-vits-svc/graphs/contributors">
+  <img src="https://contrib.rocks/image?repo=PlayVoice/so-vits-svc" />
+</a>
+## 特别感谢
+https://github.com/Francis-Komizu/Sovits
+## 原创过程
+2022.04.12 https://mp.weixin.qq.com/s/autNBYCsG4_SvWt2-Ll_zA
+2022.04.22 https://github.com/PlayVoice/VI-SVS
+2022.07.26 https://mp.weixin.qq.com/s/qC4TJy-4EVdbpvK2cQb1TA
+2022.09.08 https://github.com/PlayVoice/VI-SVC
+## 被这个项目拷贝：svc-develop-team/so-vits-svc
+![coarse_f0_1](https://github.com/PlayVoice/so-vits-svc-5.0/assets/16432329/e2f5e5d3-d169-42c1-953f-4e1648b6da37)
+![coarse_f0_2](https://github.com/PlayVoice/so-vits-svc-5.0/assets/16432329/f3539c83-7c8a-425e-bf20-2c402132f0f4)
+![coarse_f0_3](https://github.com/PlayVoice/so-vits-svc-5.0/assets/16432329/f3cee94a-0eeb-4189-b9bb-7043d06e62ef)
+## Rcell对拷贝的真实回应
+![Rcell](https://github.com/PlayVoice/so-vits-svc-5.0/assets/16432329/8ebb236d-e233-4cea-9359-8e44029b5af5)

__pycache__/svc_inference.cpython-310.pyc ADDED Viewed

Binary file (6.85 kB). View file

app.py ADDED Viewed

	@@ -0,0 +1,444 @@

+import os
+import subprocess
+import yaml
+import sys
+import webbrowser
+import gradio as gr
+from ruamel.yaml import YAML
+import shutil
+import soundfile
+import shlex
+import locale
+class WebUI:
+    def __init__(self):
+        self.train_config_path = 'configs/train.yaml'
+        self.info = Info()
+        self.names = []
+        self.names2 = []
+        self.voice_names = []
+        self.base_config_path = 'configs/base.yaml'
+        if not os.path.exists(self.train_config_path):
+            shutil.copyfile(self.base_config_path, self.train_config_path)
+            print(i18n("初始化成功"))
+        else:
+            print(i18n("就绪"))
+        self.main_ui()
+    def main_ui(self):
+        with gr.Blocks(theme=gr.themes.Base(primary_hue=gr.themes.colors.green)) as ui:
+            gr.Markdown('# so-vits-svc5.0 WebUI')
+            with gr.Tab(i18n("预处理-训练")):
+                with gr.Accordion(i18n('训练说明'), open=False):
+                    gr.Markdown(self.info.train)
+                gr.Markdown(i18n('### 预处理参数设置'))
+                with gr.Row():
+                    self.model_name = gr.Textbox(value='sovits5.0', label='model', info=i18n('模型名称'), interactive=True) #建议设置为不可修改
+                    self.f0_extractor = gr.Textbox(value='crepe', label='f0_extractor', info=i18n('f0提取器'), interactive=False)
+                    self.thread_count = gr.Slider(minimum=1, maximum=os.cpu_count(), step=1, value=2, label='thread_count', info=i18n('预处理线程数'), interactive=True)
+                gr.Markdown(i18n('### 训练参数设置'))
+                with gr.Row():
+                    self.learning_rate = gr.Number(value=5e-5, label='learning_rate', info=i18n('学习率'), interactive=True)
+                    self.batch_size = gr.Slider(minimum=1, maximum=50, step=1, value=6, label='batch_size', info=i18n('批大小'), interactive=True)
+                with gr.Row():
+                    self.info_interval = gr.Number(value=50, label='info_interval', info=i18n('训练日志记录间隔（step）'), interactive=True)
+                    self.eval_interval = gr.Number(value=1, label='eval_interval', info=i18n('验证集验证间隔（epoch）'), interactive=True)
+                    self.save_interval = gr.Number(value=5, label='save_interval', info=i18n('检查点保存间隔（epoch）'), interactive=True)
+                    self.keep_ckpts = gr.Number(value=0, label='keep_ckpts', info=i18n('保留最新的检查点文件(0保存全部)'),interactive=True)
+                with gr.Row():
+                    self.slow_model = gr.Checkbox(label=i18n("是否添加底模"), value=True, interactive=True)
+                gr.Markdown(i18n('### 开始训练'))
+                with gr.Row():
+                    self.bt_open_dataset_folder = gr.Button(value=i18n('打开数据集文件夹'))
+                    self.bt_onekey_train = gr.Button(i18n('一键训练'), variant="primary")
+                    self.bt_tb = gr.Button(i18n('启动Tensorboard'), variant="primary")
+                gr.Markdown(i18n('### 恢复训练'))
+                with gr.Row():
+                    self.resume_model = gr.Dropdown(choices=sorted(self.names), label='Resume training progress from checkpoints', info=i18n('从检查点恢复训练进度'), interactive=True)
+                    with gr.Column():
+                        self.bt_refersh = gr.Button(i18n('刷新'))
+                        self.bt_resume_train = gr.Button(i18n('恢复训练'), variant="primary")
+            with gr.Tab(i18n("推理")):
+                with gr.Accordion(i18n('推理说明'), open=False):
+                    gr.Markdown(self.info.inference)
+                gr.Markdown(i18n('### 推理参数设置'))
+                with gr.Row():
+                    with gr.Column():
+                        self.keychange = gr.Slider(-24, 24, value=0, step=1, label=i18n('变调'))
+                        self.file_list = gr.Markdown(value="", label=i18n("文件列表"))
+                        with gr.Row():
+                            self.resume_model2 = gr.Dropdown(choices=sorted(self.names2), label='Select the model you want to export',
+                                                             info=i18n('选择要导出的模型'), interactive=True)
+                            with gr.Column():
+                                self.bt_refersh2 = gr.Button(value=i18n('刷新模型和音色'))
+                                self.bt_out_model = gr.Button(value=i18n('导出模型'), variant="primary")
+                        with gr.Row():
+                            self.resume_voice = gr.Dropdown(choices=sorted(self.voice_names), label='Select the sound file',
+                                                            info=i18n('选择音色文件'), interactive=True)
+                        with gr.Row():
+                            self.input_wav = gr.Audio(type='filepath', label=i18n('选择待转换音频'), source='upload')
+                        with gr.Row():
+                            self.bt_infer = gr.Button(value=i18n('开始转换'), variant="primary")
+                        with gr.Row():
+                            self.output_wav = gr.Audio(label=i18n('输出音频'), interactive=False)
+            self.bt_open_dataset_folder.click(fn=self.openfolder)
+            self.bt_onekey_train.click(fn=self.onekey_training,inputs=[self.model_name, self.thread_count,self.learning_rate,self.batch_size, self.info_interval, self.eval_interval,self.save_interval, self.keep_ckpts, self.slow_model])
+            self.bt_out_model.click(fn=self.out_model, inputs=[self.model_name, self.resume_model2])
+            self.bt_tb.click(fn=self.tensorboard)
+            self.bt_refersh.click(fn=self.refresh_model, inputs=[self.model_name], outputs=[self.resume_model])
+            self.bt_resume_train.click(fn=self.resume_train, inputs=[self.model_name, self.resume_model, self.learning_rate,self.batch_size, self.info_interval, self.eval_interval,self.save_interval, self.keep_ckpts, self.slow_model])
+            self.bt_infer.click(fn=self.inference, inputs=[self.input_wav, self.resume_voice, self.keychange], outputs=[self.output_wav])
+            self.bt_refersh2.click(fn=self.refresh_model_and_voice, inputs=[self.model_name],outputs=[self.resume_model2, self.resume_voice])
+        ui.launch(inbrowser=True, server_port=2333, share=True)
+    def openfolder(self):
+        try:
+            if sys.platform.startswith('win'):
+                os.startfile('dataset_raw')
+            elif sys.platform.startswith('linux'):
+                subprocess.call(['xdg-open', 'dataset_raw'])
+            elif sys.platform.startswith('darwin'):
+                subprocess.call(['open', 'dataset_raw'])
+            else:
+                print(i18n('打开文件夹失败！'))
+        except BaseException:
+            print(i18n('打开文件夹失败！'))
+    def preprocessing(self, thread_count):
+        print(i18n('开始预处理'))
+        train_process = subprocess.Popen('python -u svc_preprocessing.py -t ' + str(thread_count), stdout=subprocess.PIPE)
+        while train_process.poll() is None:
+            output = train_process.stdout.readline().decode('utf-8')
+            print(output, end='')
+    def create_config(self, model_name, learning_rate, batch_size, info_interval, eval_interval, save_interval,
+                      keep_ckpts, slow_model):
+        yaml = YAML()
+        yaml.preserve_quotes = True
+        yaml.width = 1024
+        with open("configs/train.yaml", "r") as f:
+            config = yaml.load(f)
+        config['train']['model'] = model_name
+        config['train']['learning_rate'] = learning_rate
+        config['train']['batch_size'] = batch_size
+        config["log"]["info_interval"] = int(info_interval)
+        config["log"]["eval_interval"] = int(eval_interval)
+        config["log"]["save_interval"] = int(save_interval)
+        config["log"]["keep_ckpts"] = int(keep_ckpts)
+        if slow_model:
+            config["train"]["pretrain"] = "vits_pretrain\sovits5.0.pretrain.pth"
+        else:
+            config["train"]["pretrain"] = ""
+        with open("configs/train.yaml", "w") as f:
+            yaml.dump(config, f)
+        return f"{config['log']}"
+    def training(self, model_name):
+        print(i18n('开始训练'))
+        train_process = subprocess.Popen('python -u svc_trainer.py -c ' + self.train_config_path + ' -n ' + str(model_name), stdout=subprocess.PIPE, creationflags=subprocess.CREATE_NEW_CONSOLE)
+        while train_process.poll() is None:
+            output = train_process.stdout.readline().decode('utf-8')
+            print(output, end='')
+    def onekey_training(self, model_name, thread_count, learning_rate, batch_size, info_interval, eval_interval, save_interval, keep_ckpts, slow_model):
+        print(self, model_name, thread_count, learning_rate, batch_size, info_interval, eval_interval,
+              save_interval, keep_ckpts)
+        self.create_config(model_name, learning_rate, batch_size, info_interval, eval_interval, save_interval, keep_ckpts, slow_model)
+        self.preprocessing(thread_count)
+        self.training(model_name)
+    def out_model(self, model_name, resume_model2):
+        print(i18n('开始导出模型'))
+        try:
+            subprocess.Popen('python -u svc_export.py -c {} -p "chkpt/{}/{}"'.format(self.train_config_path, model_name, resume_model2),stdout=subprocess.PIPE)
+            print(i18n('导出模型成功'))
+        except Exception as e:
+            print(i18n("出现错误："), e)
+    def tensorboard(self):
+        if sys.platform.startswith('win'):
+            tb_process = subprocess.Popen('tensorboard --logdir=logs --port=6006', stdout=subprocess.PIPE)
+            webbrowser.open("http://localhost:6006")
+        else:
+            p1 = subprocess.Popen(["ps", "-ef"], stdout=subprocess.PIPE) #ps -ef | grep tensorboard | awk '{print $2}' | xargs kill -9
+            p2 = subprocess.Popen(["grep", "tensorboard"], stdin=p1.stdout, stdout=subprocess.PIPE)
+            p3 = subprocess.Popen(["awk", "{print $2}"], stdin=p2.stdout, stdout=subprocess.PIPE)
+            p4 = subprocess.Popen(["xargs", "kill", "-9"], stdin=p3.stdout)
+            p1.stdout.close()
+            p2.stdout.close()
+            p3.stdout.close()
+            p4.communicate()
+            tb_process = subprocess.Popen('tensorboard --logdir=logs --port=6007', stdout=subprocess.PIPE)  # AutoDL端口设置为6007
+        while tb_process.poll() is None:
+            output = tb_process.stdout.readline().decode('utf-8')
+            print(output)
+    def refresh_model(self, model_name):
+        self.script_dir = os.path.dirname(os.path.abspath(__file__))
+        self.model_root = os.path.join(self.script_dir, f"chkpt/{model_name}")
+        self.names = []
+        try:
+            for self.name in os.listdir(self.model_root):
+                if self.name.endswith(".pt"):
+                    self.names.append(self.name)
+            return {"choices": sorted(self.names), "__type__": "update"}
+        except FileNotFoundError:
+            return {"label": i18n("缺少模型文件"), "__type__": "update"}
+    def refresh_model2(self, model_name):
+        self.script_dir = os.path.dirname(os.path.abspath(__file__))
+        self.model_root = os.path.join(self.script_dir, f"chkpt/{model_name}")
+        self.names2 = []
+        try:
+            for self.name in os.listdir(self.model_root):
+                if self.name.endswith(".pt"):
+                    self.names2.append(self.name)
+            return {"choices": sorted(self.names2), "__type__": "update"}
+        except FileNotFoundError:
+            return {"label": i18n("缺少模型文件"), "__type__": "update"}
+    def refresh_voice(self):
+        self.script_dir = os.path.dirname(os.path.abspath(__file__))
+        self.model_root = os.path.join(self.script_dir, "data_svc/singer")
+        self.voice_names = []
+        try:
+            for self.name in os.listdir(self.model_root):
+                if self.name.endswith(".npy"):
+                    self.voice_names.append(self.name)
+            return {"choices": sorted(self.voice_names), "__type__": "update"}
+        except FileNotFoundError:
+            return {"label": i18n("缺少文件"), "__type__": "update"}
+    def refresh_model_and_voice(self, model_name):
+        model_update = self.refresh_model2(model_name)
+        voice_update = self.refresh_voice()
+        return model_update, voice_update
+    def resume_train(self, model_name, resume_model ,learning_rate, batch_size, info_interval, eval_interval, save_interval, keep_ckpts, slow_model):
+        print(i18n('开始恢复训练'))
+        self.create_config(model_name, learning_rate, batch_size, info_interval, eval_interval, save_interval,keep_ckpts, slow_model)
+        train_process = subprocess.Popen('python -u svc_trainer.py -c {} -n {} -p "chkpt/{}/{}"'.format(self.train_config_path, model_name, model_name, resume_model), stdout=subprocess.PIPE, creationflags=subprocess.CREATE_NEW_CONSOLE)
+        while train_process.poll() is None:
+            output = train_process.stdout.readline().decode('utf-8')
+            print(output, end='')
+    def inference(self, input, resume_voice, keychange):
+        if os.path.exists("test.wav"):
+            os.remove("test.wav")
+            print(i18n("已清理残留文件"))
+        else:
+            print(i18n("无需清理残留文件"))
+        self.train_config_path = 'configs/train.yaml'
+        print(i18n('开始推理'))
+        shutil.copy(input, ".")
+        input_name = os.path.basename(input)
+        os.rename(input_name, "test.wav")
+        input_name = "test.wav"
+        if not input_name.endswith(".wav"):
+            data, samplerate = soundfile.read(input_name)
+            input_name = input_name.rsplit(".", 1)[0] + ".wav"
+            soundfile.write(input_name, data, samplerate)
+        train_config_path = shlex.quote(self.train_config_path)
+        keychange = shlex.quote(str(keychange))
+        cmd = ["python", "-u", "svc_inference.py", "--config", train_config_path, "--model", "sovits5.0.pth", "--spk",
+               f"data_svc/singer/{resume_voice}", "--wave", "test.wav", "--shift", keychange]
+        train_process = subprocess.run(cmd, shell=False, capture_output=True, text=True)
+        print(train_process.stdout)
+        print(train_process.stderr)
+        print(i18n("推理成功"))
+        return "svc_out.wav"
+class Info:
+    def __init__(self) -> None:
+        self.train = i18n('### 2023.7.11|[@OOPPEENN](https://github.com/OOPPEENN)第一次编写|[@thestmitsuk](https://github.com/thestmitsuki)二次补完')
+        self.inference = i18n('### 2023.7.11|[@OOPPEENN](https://github.com/OOPPEENN)第一次编写|[@thestmitsuk](https://github.com/thestmitsuki)二次补完')
+LANGUAGE_LIST = ['zh_CN', 'en_US']
+LANGUAGE_ALL = {
+    'zh_CN': {
+        'SUPER': 'END',
+        'LANGUAGE': 'zh_CN',
+        '初始化成功': '初始化成功',
+        '就绪': '就绪',
+        '预处理-训练': '预处理-训练',
+        '训练说明': '训练说明',
+        '### 预处理参数设置': '### 预处理参数设置',
+        '模型名称': '模型名称',
+        'f0提取器': 'f0提取器',
+        '预处理线程数': '预处理线程数',
+        '### 训练参数设置': '### 训练参数设置',
+        '学习率': '学习率',
+        '批大小': '批大小',
+        '训练日志记录间隔（step）': '训练日志记录间隔（step）',
+        '验证集验证间隔（epoch）': '验证集验证间隔（epoch）',
+        '检查点保存间隔（epoch）': '检查点保存间隔（epoch）',
+        '保留最新的检查点文件(0保存全部)': '保留最新的检查点文件(0保存全部)',
+        '是否添加底模': '是否添加底模',
+        '### 开始训练': '### 开始训练',
+        '打开数据集文件夹': '打开数据集文件夹',
+        '一键训练': '一键训练',
+        '启动Tensorboard': '启动Tensorboard',
+        '### 恢复训练': '### 恢复训练',
+        '从检查点恢复训练进度': '从检查点恢复训练进度',
+        '刷新': '刷新',
+        '恢复训练': '恢复训练',
+        '推理': '推理',
+        '推理说明': '推理说明',
+        '### 推理参数设置': '### 推理参数设置',
+        '变调': '变调',
+        '文件列表': '文件列表',
+        '选择要导出的模型': '选择要导出的模型',
+        '刷新模型和音色': '刷新模型和音色',
+        '导出模型': '导出模型',
+        '选择音色文件': '选择音色文件',
+        '选择待转换音频': '选择待转换音频',
+        '开始转换': '开始转换',
+        '输出音频': '输出音频',
+        '打开文件夹失败！': '打开文件夹失败！',
+        '开始预处理': '开始预处理',
+        '开始训练': '开始训练',
+        '开始导出模型': '开始导出模型',
+        '导出模型成功': '导出模型成功',
+        '出现错误：': '出现错误：',
+        '缺少模型文件': '缺少模型文件',
+        '缺少文件': '缺少文件',
+        '已清理残留文件': '已清理残留文件',
+        '无需清理残留文件': '无需清理残留文件',
+        '开始推理': '开始推理',
+        '推理成功': '推理成功',
+        '### 2023.7.11|[@OOPPEENN](https://github.com/OOPPEENN)第一次编写|[@thestmitsuk](https://github.com/thestmitsuki)二次补完': '### 2023.7.11|[@OOPPEENN](https://github.com/OOPPEENN)第一次编写|[@thestmitsuk](https://github.com/thestmitsuki)二次补完'
+    },
+    'en_US': {
+        'SUPER': 'zh_CN',
+        'LANGUAGE': 'en_US',
+        '初始化成功': 'Initialization successful',
+        '就绪': 'Ready',
+        '预处理-训练': 'Preprocessing-Training',
+        '训练说明': 'Training instructions',
+        '### 预处理参数设置': '### Preprocessing parameter settings',
+        '模型名称': 'Model name',
+        'f0提取器': 'f0 extractor',
+        '预处理线程数': 'Preprocessing thread number',
+        '### 训练参数设置': '### Training parameter settings',
+        '学习率': 'Learning rate',
+        '批大小': 'Batch size',
+        '训练日志记录间隔（step）': 'Training log recording interval (step)',
+        '验证集验证间隔（epoch）': 'Validation set validation interval (epoch)',
+        '检查点保存间隔（epoch）': 'Checkpoint save interval (epoch)',
+        '保留最新的检查点文件(0保存全部)': 'Keep the latest checkpoint file (0 save all)',
+        '是否添加底模': 'Whether to add the base model',
+        '### 开始训练': '### Start training',
+        '打开数据集文件夹': 'Open the dataset folder',
+        '一键训练': 'One-click training',
+        '启动Tensorboard': 'Start Tensorboard',
+        '### 恢复训练': '### Resume training',
+        '从检查点恢复训练进度': 'Restore training progress from checkpoint',
+        '刷新': 'Refresh',
+        '恢复训练': 'Resume training',
+        "推理": "Inference",
+        "推理说明": "Inference instructions",
+        "### 推理参数设置": "### Inference parameter settings",
+        "变调": "Pitch shift",
+        "文件列表": "File list",
+        "选择要导出的模型": "Select the model to export",
+        "刷新模型和音色": "Refresh model and timbre",
+        "导出模型": "Export model",
+        "选择音色文件": "Select timbre file",
+        "选择待转换音频": "Select audio to be converted",
+        "开始转换": "Start conversion",
+        "输出音频": "Output audio",
+        "打开文件夹失败！": "Failed to open folder!",
+        "开始预处理": "Start preprocessing",
+        "开始训练": "Start training",
+        "开始导出模型": "Start exporting model",
+        "导出模���成功": "Model exported successfully",
+        "出现错误：": "An error occurred:",
+        "缺少模型文件": "Missing model file",
+        '缺少文件': 'Missing file',
+        "已清理残留文件": "Residual files cleaned up",
+        "无需清理残留文件": "No need to clean up residual files",
+        "开始推理": "Start inference",
+        '### 2023.7.11|[@OOPPEENN](https://github.com/OOPPEENN)第一次编写|[@thestmitsuk](https://github.com/thestmitsuki)二次补完': '### 2023.7.11|[@OOPPEENN](https://github.com/OOPPEENN)first writing|[@thestmitsuk](https://github.com/thestmitsuki)second completion'
+    }
+}
+class I18nAuto:
+    def __init__(self, language=None):
+        self.language_list = LANGUAGE_LIST
+        self.language_all = LANGUAGE_ALL
+        self.language_map = {}
+        self.language = language or locale.getdefaultlocale()[0]
+        if self.language not in self.language_list:
+            self.language = 'zh_CN'
+        self.read_language(self.language_all['zh_CN'])
+        while self.language_all[self.language]['SUPER'] != 'END':
+            self.read_language(self.language_all[self.language])
+            self.language = self.language_all[self.language]['SUPER']
+    def read_language(self, lang_dict: dict):
+        self.language_map.update(lang_dict)
+    def __call__(self, key):
+        return self.language_map[key]
+if __name__ == "__main__":
+    i18n = I18nAuto()
+    webui = WebUI()

colab.ipynb ADDED Viewed

	@@ -0,0 +1,374 @@

+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "SggegFslkbbK"
+      },
+      "source": [
+        "https://github.com/PlayVoice/so-vits-svc-5.0/\n",
+        "\n",
+        "↑原仓库\n",
+        "\n",
+        "*《colab保持连接的方法》*https://zhuanlan.zhihu.com/p/144629818\n",
+        "\n",
+        "预览版本，可使用预设模型进行推理"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "M1MdDryJP73G"
+      },
+      "source": [
+        "# **环境配置&必要文件下载**\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "xfJWCr_EkO2i"
+      },
+      "outputs": [],
+      "source": [
+        "#@title 看看抽了个啥卡~~基本都是T4~~\n",
+        "!nvidia-smi"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "nMspj8t3knR6"
+      },
+      "outputs": [],
+      "source": [
+        "#@title 克隆github仓库\n",
+        "!git clone https://github.com/PlayVoice/so-vits-svc-5.0/ -b bigvgan-mix-v2"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "Kj2j81K6kubj"
+      },
+      "outputs": [],
+      "source": [
+        "#@title 安装依赖&下载必要文件\n",
+        "%cd /content/so-vits-svc-5.0\n",
+        "\n",
+        "!pip install -r requirements.txt\n",
+        "!pip install --upgrade pip setuptools numpy numba\n",
+        "\n",
+        "!wget -P hubert_pretrain/ https://github.com/bshall/hubert/releases/download/v0.1/hubert-soft-0d54a1f4.pt\n",
+        "!wget -P whisper_pretrain/ https://openaipublic.azureedge.net/main/whisper/models/81f7c96c852ee8fc832187b0132e569d6c3065a3252ed18e56effd0b6a73e524/large-v2.pt\n",
+        "!wget -P speaker_pretrain/ https://github.com/PlayVoice/so-vits-svc-5.0/releases/download/dependency/best_model.pth.tar\n",
+        "!wget -P crepe/assets https://github.com/PlayVoice/so-vits-svc-5.0/releases/download/dependency/full.pth\n",
+        "!wget -P vits_pretrain https://github.com/PlayVoice/so-vits-svc-5.0/releases/download/5.0/sovits5.0.pretrain.pth"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "v9zHS9VXly9b"
+      },
+      "outputs": [],
+      "source": [
+        "#@title 加载Google云端硬盘\n",
+        "from google.colab import drive\n",
+        "drive.mount('/content/drive')"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "hZ5KH8NgQ7os"
+      },
+      "source": [
+        "# 包含多说话人的推理预览"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "2o6m3D0IsphU"
+      },
+      "outputs": [],
+      "source": [
+        "#@title 提取内容编码\n",
+        "\n",
+        "#@markdown **将处理好的\" .wav \"输入源文件上传到云盘根目录，并修改以下选项**\n",
+        "\n",
+        "#@markdown **\" .wav \"文件【文件名】**\n",
+        "input = \"\\u30AE\\u30BF\\u30FC\\u3068\\u5B64\\u72EC\\u3068\\u84BC\\u3044\\u60D1\\u661F\" #@param {type:\"string\"}\n",
+        "input_path = \"/content/drive/MyDrive/\"\n",
+        "input_name =  input_path + input\n",
+        "!PYTHONPATH=. python whisper/inference.py -w {input_name}.wav -p test.ppg.npy"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "A7nvX5mRlwJ7"
+      },
+      "outputs": [],
+      "source": [
+        "#@title 推理\n",
+        "\n",
+        "#@markdown **将处理好的\" .wav \"输入源文件上传到云盘根目录，并修改以下选项**\n",
+        "\n",
+        "#@markdown **\" .wav \"文件【文件名】**\n",
+        "input = \"\\u30AE\\u30BF\\u30FC\\u3068\\u5B64\\u72EC\\u3068\\u84BC\\u3044\\u60D1\\u661F\" #@param {type:\"string\"}\n",
+        "input_path = \"/content/drive/MyDrive/\"\n",
+        "input_name =  input_path + input\n",
+        "#@markdown **指定说话人（0001~0056）（推荐0022、0030、0047、0051）**\n",
+        "speaker = \"0002\" #@param {type:\"string\"}\n",
+        "!PYTHONPATH=. python svc_inference.py --config configs/base.yaml --model vits_pretrain/sovits5.0.pretrain.pth --spk ./configs/singers/singer{speaker}.npy --wave {input_name}.wav  --ppg test.ppg.npy"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "F8oerogXyd3u"
+      },
+      "source": [
+        "推理结果保存在根目录，文件名为svc_out.wav"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "qKX17GElPuso"
+      },
+      "source": [
+        "# 训练"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "sVe0lEGWQBLU"
+      },
+      "source": [
+        "将音频剪裁为小于30秒的音���段，响度匹配并修改为单声道，预处理时会进行重采样所以对采样率无要求。（但是降低采样率的操作会降低你的数据质量）\n",
+        "\n",
+        "**使用Adobe Audition™的响度匹配功能可以一次性完成重采样修改声道和响度匹配。**\n",
+        "\n",
+        "之后将音频文件保存为以下文件结构：\n",
+        "```\n",
+        "dataset_raw\n",
+        "├───speaker0\n",
+        "│   ├───xxx1-xxx1.wav\n",
+        "│   ├───...\n",
+        "│   └───Lxx-0xx8.wav\n",
+        "└───speaker1\n",
+        "    ├───xx2-0xxx2.wav\n",
+        "    ├───...\n",
+        "    └───xxx7-xxx007.wav\n",
+        "```\n",
+        "\n",
+        "打包为zip格式，命名为data.zip，上传到网盘根目录。"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "vC8IthV8VYgy"
+      },
+      "outputs": [],
+      "source": [
+        "#@title 从云盘获取数据集\n",
+        "!unzip -d /content/so-vits-svc-5.0/ /content/drive/MyDrive/data.zip #自行修改路径与文件名"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "J101PiFUSL1N"
+      },
+      "outputs": [],
+      "source": [
+        "#@title 重采样\n",
+        "# 生成采样率16000Hz音频, 存储路径为：./data_svc/waves-16k\n",
+        "!python prepare/preprocess_a.py -w ./dataset_raw -o ./data_svc/waves-16k -s 16000\n",
+        "# 生成采样率32000Hz音频, 存储路径为：./data_svc/waves-32k\n",
+        "!python prepare/preprocess_a.py -w ./dataset_raw -o ./data_svc/waves-32k -s 32000"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "ZpxeYJCBSbgf"
+      },
+      "outputs": [],
+      "source": [
+        "#@title 提取f0\n",
+        "!python prepare/preprocess_f0.py -w data_svc/waves-16k/ -p data_svc/pitch"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "7VasDGhDSlP5"
+      },
+      "outputs": [],
+      "source": [
+        "#@title 使用16k音频，提取内容编码\n",
+        "!PYTHONPATH=. python prepare/preprocess_ppg.py -w data_svc/waves-16k/ -p data_svc/whisper"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "#@title 使用16k音频，提取内容编码\n",
+        "!PYTHONPATH=. python prepare/preprocess_hubert.py -w data_svc/waves-16k/ -v data_svc/hubert"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "ovRqQUINSoII"
+      },
+      "outputs": [],
+      "source": [
+        "#@title 提取音色特征\n",
+        "!PYTHONPATH=. python prepare/preprocess_speaker.py data_svc/waves-16k/ data_svc/speaker"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "s8Ba8Fd1bzzX"
+      },
+      "outputs": [],
+      "source": [
+        "#（解决“.ipynb_checkpoints”相关的错）\n",
+        "!rm -rf \"find -type d -name .ipynb_checkpoints\""
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "ic9q599_b0Ae"
+      },
+      "outputs": [],
+      "source": [
+        "#（解决“.ipynb_checkpoints”相关的错）\n",
+        "!rm -rf .ipynb_checkpoints\n",
+        "!find . -name \".ipynb_checkpoints\" -exec rm -rf {} \\;"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "QamG3_B6o3vF"
+      },
+      "outputs": [],
+      "source": [
+        "#@title 提取平均音色\n",
+        "!PYTHONPATH=. python prepare/preprocess_speaker_ave.py data_svc/speaker/ data_svc/singer"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "3wBmyQHvSs6K"
+      },
+      "outputs": [],
+      "source": [
+        "#@title 提取spec\n",
+        "!PYTHONPATH=. python prepare/preprocess_spec.py -w data_svc/waves-32k/ -s data_svc/specs"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "tUcljCLbS5O3"
+      },
+      "outputs": [],
+      "source": [
+        "#@title 生成索引\n",
+        "!python prepare/preprocess_train.py"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "30fXnscFS7Wo"
+      },
+      "outputs": [],
+      "source": [
+        "#@title 训练文件调试\n",
+        "!PYTHONPATH=. python prepare/preprocess_zzz.py"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "hacR8qDFVOWo"
+      },
+      "outputs": [],
+      "source": [
+        "#@title  设定模型备份\n",
+        "#@markdown **是否备份模型到云盘，colab随时爆炸建议备份，默认保存到云盘根目录Sovits5.0文件夹**\n",
+        "Save_to_drive = True #@param {type:\"boolean\"}\n",
+        "if Save_to_drive:\n",
+        "  !mkdir -p /content/so-vits-svc-5.0/chkpt/\n",
+        "  !rm -rf /content/so-vits-svc-5.0/chkpt/\n",
+        "  !mkdir -p /content/drive/MyDrive/Sovits5.0\n",
+        "  !ln -s /content/drive/MyDrive/Sovits5.0 /content/so-vits-svc-5.0/chkpt/"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "5BIiKIAoU3Kd"
+      },
+      "outputs": [],
+      "source": [
+        "#@title  开始训练\n",
+        "%load_ext tensorboard\n",
+        "%tensorboard --logdir /content/so-vits-svc-5.0/logs/\n",
+        "\n",
+        "!PYTHONPATH=. python svc_trainer.py -c configs/base.yaml -n sovits5.0"
+      ]
+    }
+  ],
+  "metadata": {
+    "accelerator": "GPU",
+    "colab": {
+      "provenance": []
+    },
+    "gpuClass": "standard",
+    "kernelspec": {
+      "display_name": "Python 3",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}

configs/base.yaml ADDED Viewed

	@@ -0,0 +1,72 @@

+train:
+  model: "sovits"
+  seed: 1234
+  epochs: 10000
+  learning_rate: 5e-5
+  betas: [0.8, 0.99]
+  lr_decay: 0.999875
+  eps: 1e-9
+  batch_size: 8
+  accum_step: 2
+  c_stft: 9
+  c_mel: 1.
+  c_kl: 0.2
+  port: 8001
+  pretrain: "./vits_pretrain/sovits5.0.pretrain.pth"
+#############################
+data:
+  training_files: "files/train.txt"
+  validation_files: "files/valid.txt"
+  segment_size: 8000  # WARNING: base on hop_length
+  max_wav_value: 32768.0
+  sampling_rate: 32000
+  filter_length: 1024
+  hop_length: 320
+  win_length: 1024
+  mel_channels: 100
+  mel_fmin: 50.0
+  mel_fmax: 16000.0
+#############################
+vits:
+  ppg_dim: 1280
+  vec_dim: 256
+  spk_dim: 256
+  gin_channels: 256
+  inter_channels: 192
+  hidden_channels: 192
+  filter_channels: 640
+#############################
+gen:
+  upsample_input: 192
+  upsample_rates: [5,4,4,2,2]
+  upsample_kernel_sizes: [15,8,8,4,4]
+  upsample_initial_channel: 320
+  resblock_kernel_sizes: [3,7,11]
+  resblock_dilation_sizes: [[1,3,5], [1,3,5], [1,3,5]]
+#############################
+mpd:
+  periods: [2,3,5,7,11]
+  kernel_size: 5
+  stride: 3
+  use_spectral_norm: False
+  lReLU_slope: 0.2
+#############################
+mrd:
+  resolutions: "[(1024, 120, 600), (2048, 240, 1200), (4096, 480, 2400), (512, 50, 240)]" # (filter_length, hop_length, win_length)
+  use_spectral_norm: False
+  lReLU_slope: 0.2
+#############################
+log:
+  info_interval: 100
+  eval_interval: 1
+  save_interval: 5
+  num_audio: 6
+  pth_dir: 'chkpt'
+  log_dir: 'logs'
+  keep_ckpts: 0
+#############################
+dist_config:
+  dist_backend: "nccl"
+  dist_url: "tcp://localhost:54321"
+  world_size: 1

configs/singers/singer0001.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e2879921d43bdbf11fc5d6ac91f434f905a2c5e59d75368bfbf3c6bdbddcb3cf
+size 1152

configs/singers/singer0002.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fbe5c7925c2fdb514e2c5b450de1d2737ec7f86f1c65eeb488c1888c0b9a7069
+size 1152

configs/singers/singer0003.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5665126aeb6c6fab89c79b90debf2ce2e64b321076dcb414089eff8848eac8b4
+size 1152

configs/singers/singer0004.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:79f0fe5993e9adcaeae25b0fa68265d40c9c1b5539ca12d6e438477de2177819
+size 1152

configs/singers/singer0005.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1158fb447929cf9400a31675cf9992fd3ed7558e061562189d9e6bf56d83fb2a
+size 1152

configs/singers/singer0006.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:06c1fd3a9afaa7944e4b81b7ca787e667b0dae8c7e90c6d24177245449f4e940
+size 1152

configs/singers/singer0007.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:36611b9e57545332b9fb97fd35a356fbe8d60258f2f5e2232168481bb6dfab5b
+size 1152

configs/singers/singer0008.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8584ad6f3569a1307082cd410085d9a562807e962274b89b72487c7bc79124d4
+size 1152

configs/singers/singer0009.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b069db4e3e5ca389ffba974c74eab46caf4c60545773e5f7e5e253310619073e
+size 1152

configs/singers/singer0010.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7d4d92735e4bac1618e89198d113013db09061b6c1f74ba0c500b70b097cd407
+size 1152

configs/singers/singer0011.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:942388b4276dc06ee365f59c324ce1642e4bf810dcc99992739787e3b9ad135d
+size 1152

configs/singers/singer0012.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3411efcf4ee4f534cea2b742c2eca166ae971efbceab21fb41b77b8923a1ba3a
+size 1152

configs/singers/singer0013.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6e8e30cd1bce61405db194278dd7bf207d16abf656dd22f9a20f29e3657674f3
+size 1152

configs/singers/singer0014.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f9cc8200753b4ba7605c9a13bf454b100025965135c5d816f7440ec53a2e6dd4
+size 1152

configs/singers/singer0015.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dcb58688e51dbdeb22e5dd85d27ff3904c4594c78420b8e9c9ab481adbecc5fe
+size 1152

configs/singers/singer0016.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:66a3c6162b8c937e9e8bbdc806b873866afce4b110664831642f7b41922bbf39
+size 1152

configs/singers/singer0017.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:84782c98c930bd980f350837f4b3e8e193c49ef46aef9f92471c6136659975a9
+size 1152

configs/singers/singer0018.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:731ebafda06aecedfd79941978149a0f87595f04e24eab7ed5300defe9070fc0
+size 1152

configs/singers/singer0019.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3d88e620994e4413c4c58ffb9239ef46ded60ff3eab0715c7af96cbe4092198f
+size 1152

configs/singers/singer0020.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3e5abaabe5457a20161351dcf5f8737d63a2a92fb1de1842ea9e92e47b9ca6fe
+size 1152

configs/singers/singer0021.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1d7f99c92c89a44c1f2dd0688f033f0593c8c88b0537b092928bfbaa63a8d3e9
+size 1152

configs/singers/singer0022.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:33becb1da48b12ba4957a0ef0b25bbd51e100d5762ebc4c7d381f6b957e682a2
+size 1152

configs/singers/singer0023.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7f49cbaf3f7653f48f80854a513a334f31dca719a09cca66e257995ce4a741a9
+size 1152

configs/singers/singer0024.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:92ed584994d56473c8bab0799d213e927c5a2928facef2b93a2f95f764d868b4
+size 1152

configs/singers/singer0025.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:14b7e1f55393d5beaa2f3bbd0ef7f2be7e108993c680acb265ff24df19f7062b
+size 1152

configs/singers/singer0026.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:92ecc9aa68f136960c00e98aaca16e92c38960bc7eb9687aee90190972974726
+size 1152

configs/singers/singer0027.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f5a8a1c2a445179d38664fb55c84ee9a36350beee50efa9f850d29b394447bfa
+size 1152

configs/singers/singer0028.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b79b8266c8d368dc99f49a347b2631e1e5cfb44056b5a9ab4470b42f9851ee35
+size 1152

configs/singers/singer0029.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:60fa5fd9e8ba14d7f6d67304842f16382f7d2e739969bde9551222ff8c282775
+size 1152

configs/singers/singer0030.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2f5070e4196c91fa713aed20aedb2a570a7b2ad8301ee61f59821dafaea3c6a7
+size 1152

configs/singers/singer0031.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:47f4f8c065be1c5448c1b80e5c99087e7357cf1f8a8a55f2d844ccf1ca4931e6
+size 1152

configs/singers/singer0032.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:019f40cf49cb7ccb44fb9c6a9f6345e84f837185a1642623144b4e2969c8738b
+size 1152

configs/singers/singer0033.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2e05e212c93fc9e7b13174dd76721ee891bb4ea8bb1638a4c43523ed65d30f67
+size 1152

configs/singers/singer0034.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:715a089dd9b3e5cbf021b0f41055f59208911e49cccf375ecf8b82544f325c3d
+size 1152

configs/singers/singer0035.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9af8cd05182ec53ff573bce53dad049759bea1de5656915f414910eaf47f61ed
+size 1152

configs/singers/singer0036.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3cec474244d86acfd24d6abf7e033b24b40b838cba2fcd3b4d0e5611313d67ef
+size 1152

configs/singers/singer0037.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:316e3435d373e352fe95fcb2ec0ab1c8afdeb270ce9f13c940ba91187eecdcf3
+size 1152

configs/singers/singer0038.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0e6458e251512dab86abce504490de6762f9c2de66ddbc853c24c3d05eb39c96
+size 1152

configs/singers/singer0039.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c2e484ae33eef7ac92dd784e9e3b9bca7e6c0838d50b43c674da47620f281f20
+size 1152

configs/singers/singer0040.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7b3a104163ad4cf87caff70b845b2c3e70190ce430a8f21247d350ef102071dc
+size 1152