danieloneill's picture
Update README.md
de2f105
|
raw
history blame
1.45 kB
metadata
license: creativeml-openrail-m
language:
  - en
pipeline_tag: audio-to-audio
tags:
  - voice-to-voice
  - ddsp-svc

These are example models I made using (and for use with) DDSP-SVC.

All examples are based on samples from an English speaker, though thanks to DDSP, they're generally fairly decent with use in a variety of other languages.

All models are sampled at 44.1khz

  • PrimReaper - Trained on YouTube content from popular YouTuber "The Prim Reaper"
  • Panam - Trained on extracted audio content from the Cyberpunk 2077 character dialogue named "Panam"
  • V-F - Trained on extracted dialogue audio from the Female "V" character in Cyberpunk 2077
  • Nora - Trained on Fallout 4 dialogue audio from the game character "Nora"

If using DDSP-SVC's gui.py, keep in mind that pitch adjustment is probably required if your voice is deeper than the character.

For realtime inference, my settings are generally as follows:

  • Pitch: 10 - 15 depending on model
  • Segmentation Size: 0.70
  • Cross fade duration: 0.06
  • Historical blocks used: 6
  • f0Extractor: rmvpe
  • Phase vocoder: Depending on model and preference, enable if model output feels robotic/stuttery, disable if it sounds "buttery"
  • K-steps: 200
  • Speedup: 10
  • Diffusion method: ddim or pndm, depending on model
  • Encode silence: Depends on the model and preference, might be best on, might be best off.