Post
1528
I'm working on talking head generation that takes audio and video as input, can someone suggest me a good existing architecture that can generate videos with less latency or can we make it in real time?
Join the community of Machine Learners and AI enthusiasts.
Sign UpI think most existing OSS talking head archs only take audio and image as input, you can checkout sadtalker (https://sadtalker.github.io/) it takes in audio and image as inputs. As for streaming you'll have to do that via api with websocket, checkout D-ID's stream api: https://docs.d-id.com/reference/createstream
Tried sadtalker , too much time consumption. D-ID is proprietary . Looking something from opensource. Tried wav2lip and also enhancing that with GFPGAN , output is good but i want something fast.