Whisper small audio captioning

This model is a finetuned whisper-small model with 1M audio caption samples from the dataset mitermix/audiosnippets and 500K samples of audio emotion dataset.

Downloads last month: 9

Safetensors

Model size

242M params

Tensor type

F32

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The model has no library tag.

Collection including cahya/whisper-small-emotion-v1.0

Whisper Emotion Captioning

Collection

Fine-tuned Whisper models for Emotion Captioning • 13 items • Updated 1 day ago