|
--- |
|
title: Compressed Wav2Lip |
|
emoji: π |
|
colorFrom: indigo |
|
colorTo: pink |
|
sdk: gradio |
|
sdk_version: 4.39.0 |
|
app_file: app.py |
|
pinned: true |
|
license: apache-2.0 |
|
--- |
|
|
|
# 28Γ Compressed Wav2Lip by Nota AI |
|
|
|
Official codebase for [**Accelerating Speech-Driven Talking Face Generation with 28Γ Compressed Wav2Lip**](https://arxiv.org/abs/2304.00471). |
|
|
|
- Presented at [ICCV'23 Demo](https://iccv2023.thecvf.com/demos-111.php) Track; [On-Device Intelligence Workshop](https://sites.google.com/g.harvard.edu/on-device-workshop-23/home) @ MLSys'23; [NVIDIA GTC 2023](https://www.nvidia.com/en-us/on-demand/search/?facet.mimetype[]=event%20session&layout=list&page=1&q=52409&sort=relevance&sortDir=desc) Poster. |
|
|
|
|
|
## Installation |
|
#### Docker (recommended) |
|
```bash |
|
git clone https://github.com/Nota-NetsPresso/nota-wav2lip.git |
|
cd nota-wav2lip |
|
docker compose run --service-ports --name nota-compressed-wav2lip compressed-wav2lip bash |
|
``` |
|
|
|
#### Conda |
|
<details> |
|
<summary>Click</summary> |
|
|
|
```bash |
|
git clone https://github.com/Nota-NetsPresso/nota-wav2lip.git |
|
cd nota-wav2lip |
|
apt-get update |
|
apt-get install ffmpeg libsm6 libxext6 tmux git -y |
|
conda create -n nota-wav2lip python=3.9 |
|
conda activate nota-wav2lip |
|
pip install -r requirements.txt |
|
``` |
|
</details> |
|
|
|
## Gradio Demo |
|
Use the below script to run the [nota-ai/compressed-wav2lip demo](https://huggingface.co/spaces/nota-ai/compressed-wav2lip). The models and sample data will be downloaded automatically. |
|
|
|
```bash |
|
bash app.sh |
|
``` |
|
|
|
## Inference |
|
(1) Download YouTube videos in the LRS3-TED label text file and preprocess them properly. |
|
- Download `lrs3_v0.4_txt.zip` from [this link](https://mmai.io/datasets/lip_reading/). |
|
- Unzip the file and make a folder structure: `./data/lrs3_v0.4_txt/lrs3_v0.4/test` |
|
- Run `bash download.sh` |
|
- Run `bash preprocess.sh` |
|
|
|
(2) Run the script to compare the original Wav2Lip with Nota's compressed version. |
|
|
|
```bash |
|
bash inference.sh |
|
``` |
|
|
|
## License |
|
- All rights related to this repository and the compressed models are reserved by Nota Inc. |
|
- The intended use is strictly limited to research and non-commercial projects. |
|
|
|
## Contact |
|
- To obtain compression code and assistance, kindly contact Nota AI ([email protected]). These are provided as part of our business solutions. |
|
- For Q&A about this repo, use this board: [Nota-NetsPresso/discussions](https://github.com/orgs/Nota-NetsPresso/discussions) |
|
|
|
## Acknowledgment |
|
- [NVIDIA Applied Research Accelerator Program](https://www.nvidia.com/en-us/industries/higher-education-research/applied-research-program/) for supporting this research. |
|
- [Wav2Lip](https://github.com/Rudrabha/Wav2Lip) and [LRS3-TED](https://www.robots.ox.ac.uk/~vgg/data/lip_reading/) for facilitating the development of the original Wav2Lip. |
|
|
|
## Citation |
|
```bibtex |
|
@article{kim2023unified, |
|
title={A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation}, |
|
author={Kim, Bo-Kyeong and Kang, Jaemin and Seo, Daeun and Park, Hancheol and Choi, Shinkook and Song, Hyoung-Kyu and Kim, Hyungshin and Lim, Sungsu}, |
|
journal={MLSys Workshop on On-Device Intelligence (ODIW)}, |
|
year={2023}, |
|
url={https://arxiv.org/abs/2304.00471} |
|
} |
|
``` |