CAFA - Controllable Automatic Foley Artist

Code

Paper

CAFA (Controllable Automatic Foley Artist) is a controllable text-video-to-audio model for Foley sound generation. Given a short video and a textual prompt, CAFA generates a synchronized audio waveform that matches both the visual content and the desired semantics described in the prompt. This allows users to modify or override the natural sound of the video by changing the prompt, enabling fine-grained control over the generated audio.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support