Generate realistic talking heads from image+audio
FitDiT is a high-fidelity virtual try-on model.
https://huggingface.co/papers/2501.03006
Audio Conditioned LipSync with Latent Diffusion Models
InstantID-XS
Text to Audio (Sound SFX) Generator
Generate images with Switti