generate_full is cut unexpectedly.
Hi Hexgard,
Thank you for the Kokoro. It is an amazing model.
I tested it with the following text on generate_full
for 25-28 seconds: "we had to move out" , Do you have any suggestions for mitigating the jump?
"As I walked through the old neighborhood, I couldn't help but notice the house where I used to live. It had been abandoned for years, and the once beautiful garden was overgrown with weeds. The windows were boarded up, and the front door was hanging off its hinges. It was sad to see the house in such a state, as it held many happy memories for me. I remembered how my family had to abandon our plans to renovate the house when my dad lost his job, and eventually, we had to move out. The house had been left to decay, and it seemed like the owners had completely abandon all hope of restoring it to its former glory. Now, it stood as a reminder of what happens when people abandon their homes and their memories."
I experience the same. I think we have to do the stiching of sentences ourselves for it to sound nicely.
I've tried splitting the punctuation, which works perfectly, but when I combine the segments back together, the gaps between them don't flow naturally i.e. there are a brief pause between item. Ideally I'd like the gaps to be minimized as if using generate_full, but without any jarring jumps.
PS. I apologize for accidentally clicking the "Close" button and ending our discussion.
I have the same issue, I tried splitting the text by sentences and splitting by sentences + tokenizing the text. Both didn't work, I think it just has a weird pause thing from the training set.
This might be an unsatisfying answer, but next week I intend to release a new base model along with a proper pip-installable inference library at https://github.com/hexgrad/kokoro that should solve (or at least substantially mitigate) the issues described here.