Text-to-Speech
English
Kokoro-82M / VOICES.md
hexgrad's picture
Upload VOICES.md
ced4a49 verified
|
raw
history blame
2.03 kB

Voices

For each voice, the given grades are intended to be estimates of the quality and quantity of its associated training data, both of which impact overall inference quality.

Voices may also subjectively sound better or worse to different people.

Target Quality

  • How high quality is the reference voice? This grade may be impacted by audio quality, artifacts, compression, & sample rate.
  • How well do the text labels match the audio? Text/audio misalignment (e.g. from hallucinations) will lower this grade.

Training Duration

  • How much audio was seen during training? Smaller durations result in a lower overall grade.

American πŸ‡ΊπŸ‡Έ

Name Traits Target Quality Training Duration Overall Grade
af_alloy 🚺 B MM minutes C
af_aoede 🚺 A H hours B+
af_bella 🚺πŸ”₯ A HH hours A-
af_jessica 🚺 C MM minutes D
af_kore 🚺 B H hours C+
af_nicole 🚺🎧 B HH hours B-
af_nova 🚺 B MM minutes C
af_river 🚺 C MM minutes D
af_sarah 🚺 B H hours C+
af_sky 🚺 B M minutes C-
am_adam 🚹 D H hours F+
am_echo 🚹 C MM minutes D
am_eric 🚹 C MM minutes D
am_fenrir 🚹 B H hours C+
am_liam 🚹 C MM minutes D
am_michael 🚹 B H hours C+
am_onyx 🚹 C MM minutes D
am_puck 🚹 B H hours C+

British πŸ‡¬πŸ‡§

Name Traits Target Quality Training Duration Overall Grade
bf_alice 🚺 C MM minutes D
bf_emma 🚺 B HH hours B-
bf_isabella 🚺 B MM minutes C
bf_lily 🚺 C MM minutes D
bm_daniel 🚹 C MM minutes D
bm_fable 🚹 B MM minutes C
bm_george 🚹 B MM minutes C
bm_lewis 🚹 C H hours D+