metadata
title: Compressed Wav2Lip
emoji: π
colorFrom: indigo
colorTo: pink
sdk: gradio
sdk_version: 4.13.0
app_file: app.py
pinned: true
license: apache-2.0
28Γ Compressed Wav2Lip by Nota AI
Official codebase for Accelerating Speech-Driven Talking Face Generation with 28Γ Compressed Wav2Lip.
- Presented at ICCV'23 Demo Track; On-Device Intelligence Workshop @ MLSys'23; NVIDIA GTC 2023 Poster.
Installation
Docker (recommended)
git clone https://github.com/Nota-NetsPresso/nota-wav2lip.git
cd nota-wav2lip
docker compose run --service-ports --name nota-compressed-wav2lip compressed-wav2lip bash
Conda
Click
git clone https://github.com/Nota-NetsPresso/nota-wav2lip.git
cd nota-wav2lip
apt-get update
apt-get install ffmpeg libsm6 libxext6 tmux git -y
conda create -n nota-wav2lip python=3.9
conda activate nota-wav2lip
pip install -r requirements.txt
Gradio Demo
Use the below script to run the nota-ai/compressed-wav2lip demo. The models and sample data will be downloaded automatically.
bash app.sh
Inference
(1) Download YouTube videos in the LRS3-TED label text file and preprocess them properly.
- Download
lrs3_v0.4_txt.zip
from this link. - Unzip the file and make a folder structure:
./data/lrs3_v0.4_txt/lrs3_v0.4/test
- Run
bash download.sh
- Run
bash preprocess.sh
(2) Run the script to compare the original Wav2Lip with Nota's compressed version.
bash inference.sh
License
- All rights related to this repository and the compressed models are reserved by Nota Inc.
- The intended use is strictly limited to research and non-commercial projects.
Contact
- To obtain compression code and assistance, kindly contact Nota AI ([email protected]). These are provided as part of our business solutions.
- For Q&A about this repo, use this board: Nota-NetsPresso/discussions
Acknowledgment
- NVIDIA Applied Research Accelerator Program for supporting this research.
- Wav2Lip and LRS3-TED for facilitating the development of the original Wav2Lip.
Citation
@article{kim2023unified,
title={A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation},
author={Kim, Bo-Kyeong and Kang, Jaemin and Seo, Daeun and Park, Hancheol and Choi, Shinkook and Song, Hyoung-Kyu and Kim, Hyungshin and Lim, Sungsu},
journal={MLSys Workshop on On-Device Intelligence (ODIW)},
year={2023},
url={https://arxiv.org/abs/2304.00471}
}