|
--- |
|
language: |
|
- en |
|
- de |
|
- es |
|
- it |
|
- nl |
|
- pt |
|
- pl |
|
- ro |
|
- sv |
|
- da |
|
- fi |
|
- hu |
|
- el |
|
- fr |
|
- ru |
|
- uk |
|
- tr |
|
- ar |
|
- hi |
|
- jp |
|
- ko |
|
- zh |
|
- vi |
|
- la |
|
- ha |
|
- sw |
|
- yo |
|
- wo |
|
library: xvasynth |
|
tags: |
|
- emotion |
|
- audio |
|
- text-to-speech |
|
- speech-to-speech |
|
- voice conversion |
|
- tts |
|
pipeline_tag: text-to-speech |
|
--- |
|
|
|
GitHub project: https://github.com/DanRuta/xVA-Synth |
|
|
|
The base model for training other xVASynth's "xVAPitch" type models (v3). Model itself is used by the xVATrainer TTS model training app and not for inference. All created by Dan ["@dr00392"](https://huggingface.co/dr00392) Ruta. |
|
|
|
`The v3 model now uses a slightly custom tweaked VITS/YourTTS model. Tweaks including larger capacity, bigger lang embedding, custom symbol set (a custom spec of ARPAbet with some more phonemes to cover other languages), and I guess a different training script.` - Dan Ruta |
|
|
|
When used in xVASynth editor, it is an American Adult Male voice. Default pacing is too fast and has to be adjusted. |
|
|
|
xVAPitch_5820651 model sample: <audio controls> |
|
<source src="https://huggingface.co/Pendrokar/xvapitch/resolve/main/xVAPitch_5820651.wav?download=true" type="audio/wav"> |
|
Your browser does not support the audio element. |
|
</audio> |
|
|
|
Papers: |
|
- VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech - https://arxiv.org/abs/2106.06103 |
|
- YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for Everyone - https://arxiv.org/abs/2112.02418 |
|
|
|
Referenced papers within code: |
|
- Multi-head attention with Relative Positional embedding - https://arxiv.org/pdf/1809.04281.pdf |
|
- Transformer with Relative Potional Encoding- https://arxiv.org/abs/1803.02155 |
|
- SDP - https://arxiv.org/pdf/2106.06103.pdf |
|
- Spline Flow - https://arxiv.org/abs/1906.04032 |
|
|
|
Used datasets: Unknown/Non-permissiable data |