Amphion Singing Voice Conversion Pretrained Models

Quick Start

We provide a DiffWaveNetSVC pretrained checkpoint for you to play. Specially, it is trained under the real-world vocalist data (total duration: 6.16 hours), including the following 15 professional singers:

Singer Language Training Duration (mins)
David Tao ้™ถๅ–† Chinese 45.51
Eason Chan ้™ˆๅฅ•่ฟ… Chinese 43.36
Feng Wang ๆฑชๅณฐ Chinese 41.08
Jian Li ๆŽๅฅ Chinese 38.90
John Mayer English 30.83
Adele English 27.23
Ying Na ้‚ฃ่‹ฑ Chinese 27.02
Yijie Shi ็Ÿณๅ€šๆด Chinese 24.93
Jacky Cheung ๅผ ๅญฆๅ‹ Chinese 18.31
Taylor Swift English 18.31
Faye Wong ็Ž‹่ฒ English 16.78
Michael Jackson English 15.13
Tsai Chin ่”ก็ด Chinese 10.12
Bruno Mars English 6.29
Beyonce English 6.06

To make these singers sing the songs you want to listen to, just run the following commands:

Step1: Download the acoustics model checkpoint

git lfs install
git clone https://huggingface.co/amphion/singing_voice_conversion

Step2: Download the vocoder checkpoint

git clone https://huggingface.co/amphion/BigVGAN_singing_bigdata

Step3: Clone the Amphion's Source Code of GitHub

git clone https://github.com/open-mmlab/Amphion.git

Step4: Download ContentVec Checkpoint

You could download ContentVec Checkpoint from this repo. In this pretrained model, we used checkpoint_best_legacy_500.pt, which is the legacy ContentVec with 500 classes.

Step5: Specify the checkpoints' path

Use the soft link to specify the downloaded checkpoints:

cd Amphion
mkdir -p ckpts/svc
ln -s "$(realpath ../singing_voice_conversion/vocalist_l1_contentvec+whisper)" ckpts/svc/vocalist_l1_contentvec+whisper
ln -s "$(realpath ../BigVGAN_singing_bigdata/bigvgan_singing)" pretrained/bigvgan_singing

Also, you need to move checkpoint_best_legacy_500.pt you downloaded at Step4 into Amphion/pretrained/contentvec.

Step6: Conversion

You can follow this recipe to conduct the conversion. For example, if you want to make Taylor Swift sing the songs in the [Your Audios Folder], just run:

sh egs/svc/MultipleContentsSVC/run.sh --stage 3 --gpu "0" \
    --config "ckpts/svc/vocalist_l1_contentvec+whisper/args.json" \
    --infer_expt_dir "ckpts/svc/vocalist_l1_contentvec+whisper" \
    --infer_output_dir "ckpts/svc/vocalist_l1_contentvec+whisper/result" \
    --infer_source_audio_dir [Your Audios Folder] \
    --infer_vocoder_dir "pretrained/bigvgan_singing" \
    --infer_target_speaker "vocalist_l1_TaylorSwift" \
    --infer_key_shift "autoshift"

Note: The supported infer_target_speaker values can be seen here.

Citaions

@article{zhang2023leveraging,
  title={Leveraging Content-based Features from Multiple Acoustic Models for Singing Voice Conversion},
  author={Zhang, Xueyao and Gu, Yicheng and Chen, Haopeng and Fang, Zihao and Zou, Lexiao and Xue, Liumeng and Wu, Zhizheng},
  journal={Machine Learning for Audio Worshop, NeurIPS 2023},
  year={2023}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Spaces using amphion/singing_voice_conversion 5