Transcribed a random video I was watching using my phone mic. It didn't even try in the second half.

#50
by IDontKnowWhatToNameMyself - opened

Input audio:

Transcription:

I am 100% turning me to a shark. I'm very excited about this.

This is expected - in the recording, there is a large gap when no speech is spoken between the first sentence and the second. The model predicts the "end of sequence" token when it gets to this gap of no speech, causing it to stop the transcription process.

The model is trained to predict the "end of sequence" token when it hears such gaps in speech.

This is expected - in the recording, there is a large gap when no speech is spoken between the first sentence and the second. The model predicts the "end of sequence" token when it gets to this gap of no speech, causing it to stop the transcription process.

The model is trained to predict the "end of sequence" token when it hears such gaps in speech.

Oh I see. That makes a lot more sense now

IDontKnowWhatToNameMyself changed discussion status to closed

Sign up or log in to comment