This is a fine tuning on the Whisper Tiny model for using it as a CMU Voice-To-Text model. No conversion is needed, the model outputs the ARPABET symbols space separated instead of plain english text. It was trained using a datasest with over 100 hours of meeting recordings of conversational english.
The model is still in development. Got a WER of 26, which is pretty good for just 100 hours of audio. But it could improve if more training data is provided. Nevertheless, this is the tiny version of Whisper which lacks of precision. Probably something like base or small will be enough for this kind of application.
- Downloads last month
- 16
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.