FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait
Abstract
With the rapid advancement of diffusion-based generative models, portrait image animation has achieved remarkable results. However, it still faces challenges in temporally consistent video generation and fast sampling due to its iterative sampling nature. This paper presents FLOAT, an audio-driven talking portrait video generation method based on flow matching generative model. We shift the generative modeling from the pixel-based latent space to a learned motion latent space, enabling efficient design of temporally consistent motion. To achieve this, we introduce a transformer-based vector field predictor with a simple yet effective frame-wise conditioning mechanism. Additionally, our method supports speech-driven emotion enhancement, enabling a natural incorporation of expressive motions. Extensive experiments demonstrate that our method outperforms state-of-the-art audio-driven talking portrait methods in terms of visual quality, motion fidelity, and efficiency.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis (2024)
- Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis (2024)
- Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation (2024)
- AnimateAnything: Consistent and Controllable Animation for Video Generation (2024)
- DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation (2024)
- Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical and Landmark Loss Optimization (2024)
- Audio-Driven Emotional 3D Talking-Head Generation (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
No code no model :(
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper