Have you ever train a 44k model?
I have failed to train from scratch. The model can not learn alignment with a tiny dataset. So I want to try your solution.
Hello, how much data do you have? which language? Actually I didn't train it from scratch, I start from original F5TTS weight. But my solution might help with better alignment.
about 12 hours, Indonesia.
This method uses phonemes instead of raw text, and uses force alignment during training. Although my training vocabulary differs from the original F5TTS model, but I still utilize the pre-trained weight (i.e: only re-initialize the text embedding layer), because it has the capacity to make sound so the training should be faster. I guess you can utilize the pre-trained model instead of training from scratch.
Previously I did some experiments with LJSpeech and the model can learn with only 10 hour of dataset. I am not sure about the results if we train the model on a different language. Currently I am doing some experiments with another language (Vietnamese), maybe we will get more insight.
hello
@sinhprous
, Can you complete the inference code for f5-tts_infer-cli?
And the finetune need some details steps, includes declare the language code.
When I finetune with your fork, it will gives error but can continue, is it ok?
Missing keys: ['ema_model.transformer.text_embed.text_embed.weight']
Unexpected keys: []
Missing keys: ['transformer.text_embed.text_embed.weight', 'duration_predictor.text_embed.weight', 'duration_predictor.conv_1.weight', 'duration_predictor.conv_1.bias', 'duration_predictor.norm_1.gamma', 'duration_predictor.norm_1.beta', 'duration_predictor.conv_2.weight', 'duration_predictor.conv_2.bias', 'duration_predictor.norm_2.gamma', 'duration_predictor.norm_2.beta', 'duration_predictor.proj.weight', 'duration_predictor.proj.bias']
Unexpected keys: []
it's okay because it re-init the text embedding layer and it adds a duration predictor.
okay I will complete the f5-tts_infer-cli
. In the meantime, you can use the notebook to do inference
Hello, @sinhprous . I have finish training Indonesia. The result is not good, the wer is much higher than the official code.
could you share some samples? how many epochs you trained?
350k, I have tried 150k 200k and 300k
I faced the same for my Vietnamese training. results are bad.
maybe the alignment is wrong with languages other than English.
if it is possible, could you share one of your training sample? (audio, text and the alignment matrix)
sorry, it is a private data. I have begin to finetune base on official again.
I faced the same for my Vietnamese training. results are bad.
maybe the alignment is wrong with languages other than English.
if it is possible, could you share one of your training sample? (audio, text and the alignment matrix)
Do you change the language for espeak?
yes I changed the language for espeak. After reviewing I think ctc-forced-aligner 's results are not correct for my dataset. I've post-processed the alignment results a bit and started training again.
yes I changed the language for espeak. After reviewing I think ctc-forced-aligner 's results are not correct for my dataset. I've post-processed the alignment results a bit and started training again.
ok, waiting for your success.
@LukeJacob2023 did you success on your training? After I fixed the alignment preparation my Vietnamese training goes well.
@LukeJacob2023 did you success on your training? After I fixed the alignment preparation my Vietnamese training goes well.
yes, I trained for 1029k on official code, get a good result, with some speed and stop problems, but not much, about 5 problems in 3 minutes output audio. You can update your fork, I will have a try. If it can save much train time and improve inference stable, only loss a little nature, I think it is a good solution. Especially for those don't have super GPUs and large datasets like me. You can enable your fork's issues, so may be we can discuss on it.
@sinhprous
Hello, @sinhprous can you update your fork or checkpoint of vi?
Hey @LukeJacob2023 sorry for late reply, I am busy with other company projects, I can update this weekend