license: apache-2.0
This model is a finetuned whisper-small model with 1M audio caption samples from the dataset mitermix/audiosnippets and 500K samples of audio emotion dataset.