Spaces:
Running
Running
File size: 4,726 Bytes
b72ab63 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 |
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Multi-Accent and Multi-Lingual Voice Clone Demo with MeloTTS"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import torch\n",
"from openvoice import se_extractor\n",
"from openvoice.api import ToneColorConverter"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Initialization\n",
"\n",
"In this example, we will use the checkpoints from OpenVoiceV2. OpenVoiceV2 is trained with more aggressive augmentations and thus demonstrate better robustness in some cases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ckpt_converter = 'checkpoints_v2/converter'\n",
"device = \"cuda:0\" if torch.cuda.is_available() else \"cpu\"\n",
"output_dir = 'outputs_v2'\n",
"\n",
"tone_color_converter = ToneColorConverter(f'{ckpt_converter}/config.json', device=device)\n",
"tone_color_converter.load_ckpt(f'{ckpt_converter}/checkpoint.pth')\n",
"\n",
"os.makedirs(output_dir, exist_ok=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Obtain Tone Color Embedding\n",
"We only extract the tone color embedding for the target speaker. The source tone color embeddings can be directly loaded from `checkpoints_v2/ses` folder."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"reference_speaker = 'resources/example_reference.mp3' # This is the voice you want to clone\n",
"target_se, audio_name = se_extractor.get_se(reference_speaker, tone_color_converter, vad=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Use MeloTTS as Base Speakers\n",
"\n",
"MeloTTS is a high-quality multi-lingual text-to-speech library by @MyShell.ai, supporting languages including English (American, British, Indian, Australian, Default), Spanish, French, Chinese, Japanese, Korean. In the following example, we will use the models in MeloTTS as the base speakers. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from melo.api import TTS\n",
"\n",
"texts = {\n",
" 'EN_NEWEST': \"Did you ever hear a folk tale about a giant turtle?\", # The newest English base speaker model\n",
" 'EN': \"Did you ever hear a folk tale about a giant turtle?\",\n",
" 'ES': \"El resplandor del sol acaricia las olas, pintando el cielo con una paleta deslumbrante.\",\n",
" 'FR': \"La lueur dorée du soleil caresse les vagues, peignant le ciel d'une palette éblouissante.\",\n",
" 'ZH': \"在这次vacation中,我们计划去Paris欣赏埃菲尔铁塔和卢浮宫的美景。\",\n",
" 'JP': \"彼は毎朝ジョギングをして体を健康に保っています。\",\n",
" 'KR': \"안녕하세요! 오늘은 날씨가 정말 좋네요.\",\n",
"}\n",
"\n",
"\n",
"src_path = f'{output_dir}/tmp.wav'\n",
"\n",
"# Speed is adjustable\n",
"speed = 1.0\n",
"\n",
"for language, text in texts.items():\n",
" model = TTS(language=language, device=device)\n",
" speaker_ids = model.hps.data.spk2id\n",
" \n",
" for speaker_key in speaker_ids.keys():\n",
" speaker_id = speaker_ids[speaker_key]\n",
" speaker_key = speaker_key.lower().replace('_', '-')\n",
" \n",
" source_se = torch.load(f'checkpoints_v2/base_speakers/ses/{speaker_key}.pth', map_location=device)\n",
" model.tts_to_file(text, speaker_id, src_path, speed=speed)\n",
" save_path = f'{output_dir}/output_v2_{speaker_key}.wav'\n",
"\n",
" # Run the tone color converter\n",
" encode_message = \"@MyShell\"\n",
" tone_color_converter.convert(\n",
" audio_src_path=src_path, \n",
" src_se=source_se, \n",
" tgt_se=target_se, \n",
" output_path=save_path,\n",
" message=encode_message)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "melo",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.18"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
|