--- license: creativeml-openrail-m language: - en pipeline_tag: audio-to-audio tags: - voice-to-voice - ddsp-svc --- These are *example* models I made using (and for use with) [DDSP-SVC](https://github.com/yxlllc/DDSP-SVC). All examples are based on samples from an English speaker, though thanks to [DDSP](https://magenta.tensorflow.org/ddsp), they're generally fairly decent with use in a variety of other languages. All models are sampled at 44.1khz - PrimReaper - Trained on YouTube content from popular YouTuber "The Prim Reaper" - Panam - Trained on extracted audio content from the Cyberpunk 2077 character dialogue named "Panam" - V-F - Trained on extracted dialogue audio from the Female "V" character in Cyberpunk 2077 - Nora - Trained on Fallout 4 dialogue audio from the game character "Nora" If using DDSP-SVC's gui.py, keep in mind that pitch adjustment is probably required if your voice is deeper than the character. For realtime inference, my settings are generally as follows: - Pitch: 10 - 15 depending on model - Segmentation Size: 0.70 - Cross fade duration: 0.06 - Historical blocks used: 6 - f0Extractor: rmvpe - Phase vocoder: Depending on model and preference, enable if model output feels robotic/stuttery, disable if it sounds "buttery" - K-steps: 200 - Speedup: 10 - Diffusion method: ddim or pndm, depending on model - Encode silence: Depends on the model and preference, might be best on, might be best off.