File size: 1,446 Bytes
1199f22
 
de2f105
 
 
 
 
 
1199f22
de2f105
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
---
license: creativeml-openrail-m
language:
- en
pipeline_tag: audio-to-audio
tags:
- voice-to-voice
- ddsp-svc
---

These are *example* models I made using (and for use with) [DDSP-SVC](https://github.com/yxlllc/DDSP-SVC).

All examples are based on samples from an English speaker, though thanks to [DDSP](https://magenta.tensorflow.org/ddsp), they're generally fairly decent with use in a variety of other languages.

All models are sampled at 44.1khz

- PrimReaper - Trained on YouTube content from popular YouTuber "The Prim Reaper"
- Panam - Trained on extracted audio content from the Cyberpunk 2077 character dialogue named "Panam"
- V-F - Trained on extracted dialogue audio from the Female "V" character in Cyberpunk 2077
- Nora - Trained on Fallout 4 dialogue audio from the game character "Nora"

If using DDSP-SVC's gui.py, keep in mind that pitch adjustment is probably required if your voice is deeper than the character.

For realtime inference, my settings are generally as follows:

- Pitch: 10 - 15 depending on model
- Segmentation Size: 0.70
- Cross fade duration: 0.06
- Historical blocks used: 6
- f0Extractor: rmvpe
- Phase vocoder: Depending on model and preference, enable if model output feels robotic/stuttery, disable if it sounds "buttery"
- K-steps: 200
- Speedup: 10
- Diffusion method: ddim or pndm, depending on model
- Encode silence: Depends on the model and preference, might be best on, might be best off.