TTS-Spaces-Arena

Running on Zero

App Files Files Community

Multi-models of Space (Kokoro)

#16

by ecyht2 - opened Nov 24, 2024

Discussion

ecyht2

Nov 24, 2024

There is a new version for StyleTTS Kokoro shown in this post. Maybe there should be separate model for it?

Pendrokar

Owner Nov 24, 2024

•

edited Dec 5, 2024

In essence you are asking for multi-voice support. And yes it is something I'd like to have. It is the only reason Parler Large is not an available model as it is in the same TTS Space.

Where as strict per-model Leaderboard would not get reliable data. Low amount of data points for each separate fine-tune... 😕

~~[edit] And the reason kokoro isn't currently working is because the API endpoints changed. I'll fix it when this Space decides to do a soft reset of the Gradio client.~~

hexgrad

Nov 24, 2024

[edit] And the reason kokoro isn't currently working is because the API endpoints changed. I'll fix it when this Space decides to do a soft reset of the Gradio client.

Oops, thought I was keeping it backwards compatible, but clearly not. I have restored API parity with some temporary hacks on my end.

For now, Kokoro should be working in the Arena again, tested:

https://huggingface.co/spaces/Pendrokar/TTS-Spaces-Arena/discussions/17 should future-proof against API-breaking changes. Once that PR lands, I can remove the temporary hacks.

Pendrokar changed discussion title from New Version of StyleTTS Kokoro to Multi-models of Space (Kokoro) Dec 8, 2024

hexgrad

Dec 9, 2024

•

edited Dec 9, 2024

@Pendrokar Kokoro v0.19 remains the stable version for this Arena, even with v0.22 dropping. I think the English differences should be minor, and there's a chance v0.22 might be superseded soon™ if/when I crack Hindi or stumble across more training data. IMO, it doesn't make sense to bump Kokoro v0.19 since it's already in the 🥇 spot by a decent margin. As the saying goes, "If it ain't broke, don't fix it."

Aside from going multilingual, v0.22 includes slightly better tokenization for hard English text (not always perfect, but better):

After I read that you can read, I can associate myself with your associates too.
ˈæftɚɹ aɪ ɹˈɛd ðæt juː kæn ɹˈiːd, aɪ kæn ɐsˈoʊsɪˌeɪt maɪsˈɛlf wɪð jʊɹ ɐsˈoʊsiəts tˈuː.

On the other hand, v0.19 returns the following:

After I read that you can read, I can associate myself with your associates too.
ˈæftɚɹ aɪ ɹˈiːd ðæt juː kæn ɹˈiːd, aɪ kæn ɐsˈoʊsɪˌeɪt maɪsˈɛlf wɪð jʊɹ ɐsˈoʊsɪˌeɪts tˈuː.

Despite the second "associates" still sounding fine in v0.19, note the input phonemes are wrong, which means the model has learned to compensate for the g2p error. That's arguably fine, but (1) relying on the model to make up for g2p errors isn't robust for all texts, and (2) it's wasted neurons that could be going towards learning more useful patterns. When you only have 82M params and you want to stuff a bunch of languages into the model, I think it's worth aggressively prosecuting these g2p errors.

Pendrokar

Owner Jan 26

@hexgrad I made a change that now allows to run multiple models on the same space.

I couldn't get v0.23 to work though. The Gradio error that I get client side was not very informative. Here are the used params:
https://huggingface.co/spaces/Pendrokar/TTS-Spaces-Arena/blob/bc6dc8016caf6ba7dd30ebc435218459a5d3e3f2/app/models.py#L481-L490

Pendrokar

Owner about 1 month ago

Using the same TTS space with different parameters is now possible. This does not matter for Kokoro v1.0 though, as it uses a router.

Pendrokar changed discussion status to closed about 1 month ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment