efficient-nlp/teochew-whisper-medium

Oct 20, 2024

Hello, saw your work on youtube. Very good. I would like to learn to make Whisper support transcribing another dialect of my local.
But there is still something about fine tuning the code on that I don't understand.
Can you publish or share the code for model training?

luckyt

Efficient NLP org Oct 20, 2024

Sure, happy to answer any questions to help you fine-tune your model! Our source code is essentially derived from this Huggingface speech recognition recipe, with only some minor modifications. There is some data needed to train the model, but we don't plan to release the dataset since it contains copyrighted material.

panlr

Nov 1, 2024

Sure, happy to answer any questions to help you fine-tune your model! Our source code is essentially derived from this Huggingface speech recognition recipe, with only some minor modifications. There is some data needed to train the model, but we don't plan to release the dataset since it contains copyrighted material.

Hey, thank you for your great work! Could you please share the code for your data processing pipeline? I am currently working on Speech Synthesis for the Teochew dialect and may need to use weakly supervised or unsupervised pre-training models to improve performance on low-resource labeled data.

dhatta

Jan 7

•

edited Jan 7

@luckyt Can i finetune distil-whisper/distil-large-v2 from above speech-recognition repo with seq2seq ?

luckyt

Efficient NLP org Jan 7

@dhatta Probably not for Teochew, since that is an English-only model, and having a model with Chinese knowledge is important.

efficient-nlp
/

teochew-whisper-medium

fine tuning code