lipSync / README.md
Suprath's picture
Upload 54 files
9f4b9c7 verified
|
raw
history blame
3.2 kB
metadata
title: Compressed Wav2Lip
emoji: 🌟
colorFrom: indigo
colorTo: pink
sdk: gradio
sdk_version: 4.13.0
app_file: app.py
pinned: true
license: apache-2.0

28Γ— Compressed Wav2Lip by Nota AI

Official codebase for Accelerating Speech-Driven Talking Face Generation with 28Γ— Compressed Wav2Lip.

Installation

Docker (recommended)

git clone https://github.com/Nota-NetsPresso/nota-wav2lip.git
cd nota-wav2lip
docker compose run --service-ports --name nota-compressed-wav2lip compressed-wav2lip bash

Conda

Click
git clone https://github.com/Nota-NetsPresso/nota-wav2lip.git
cd nota-wav2lip
apt-get update
apt-get install ffmpeg libsm6 libxext6 tmux git -y
conda create -n nota-wav2lip python=3.9
conda activate nota-wav2lip
pip install -r requirements.txt

Gradio Demo

Use the below script to run the nota-ai/compressed-wav2lip demo. The models and sample data will be downloaded automatically.

bash app.sh

Inference

(1) Download YouTube videos in the LRS3-TED label text file and preprocess them properly.

  • Download lrs3_v0.4_txt.zip from this link.
  • Unzip the file and make a folder structure: ./data/lrs3_v0.4_txt/lrs3_v0.4/test
  • Run bash download.sh
  • Run bash preprocess.sh

(2) Run the script to compare the original Wav2Lip with Nota's compressed version.

bash inference.sh

License

  • All rights related to this repository and the compressed models are reserved by Nota Inc.
  • The intended use is strictly limited to research and non-commercial projects.

Contact

Acknowledgment

Citation

@article{kim2023unified,
      title={A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation}, 
      author={Kim, Bo-Kyeong and Kang, Jaemin and Seo, Daeun and Park, Hancheol and Choi, Shinkook and Song, Hyoung-Kyu and Kim, Hyungshin and Lim, Sungsu},
      journal={MLSys Workshop on On-Device Intelligence (ODIW)},
      year={2023},
      url={https://arxiv.org/abs/2304.00471}
}