toshas's picture
initial commit
a45988a
|
raw
history blame
4.29 kB

πŸ›Ή RollingDepth: Video Depth without Video Models

Website Hugging Face Model

This repository represents the official implementation of the paper titled "Video Depth without Video Models".

Bingxin Ke1, Dominik Narnhofer1, Shengyu Huang1, Lei Ke2, Torben Peters1, Katerina Fragkiadaki2, Anton Obukhov1, Konrad Schindler1

1ETH Zurich, 2Carnegie Mellon University

πŸ“’ News

2024-11-28: Inference code is released.

πŸ› οΈ Setup

The inference code was tested on: Debian 12, Python 3.12.7 (venv), CUDA 12.4, GeForce RTX 3090

πŸ“¦ Repository

git clone https://github.com/prs-eth/RollingDepth.git
cd RollingDepth

🐍 Python environment

Create python environment:

# with venv
python -m venv venv/rollingdepth
source venv/rollingdepth/bin/activate

# or with conda
conda create --name rollingdepth python=3.12
conda activate rollingdepth

πŸ’» Dependencies

Install dependicies:

pip install -r requirements.txt

# Install modified diffusers with cross-frame self-attention
bash script/install_diffusers_dev.sh 

We use pyav for video I/O, which relies on ffmpeg.

πŸƒ Test on your videos

All scripts are designed to run from the project root directory.

πŸ“· Prepare input videos

  1. Use sample videos:

    bash script/download_sample_data.sh
    
  2. Or place your videos in a directory, for example, under data/samples.

πŸš€ Run with presets

python run_video.py \
    -i data/samples \
    -o output/samples_fast \
    -p fast \
    --save-npy true \
    --verbose
  • -p or --preset: preset options
    • fast for fast inference, with dilations [1, 25] (flexible), fp16, without refinement, at max. resolution 768.
    • fast1024 for fast inference at resolution 1024
    • full for better details, with dilations [1, 10, 25] (flexible), fp16, with 10 refinement steps, at max. resolution 1024.
    • paper for reproducing paper numbers, with (fixed) dilations [1, 10, 25], fp32, with 10 refinement steps, at max. resolution 768.
  • -i or --input-video: path to input data, can be a single video file, a text file with video paths, or a directory of videos.
  • -o or --output-dir: output directory.

Passing other arguments below may overwrite the preset settings:

  • Coming soon

⬇ Checkpoint cache

By default, the checkpoint is stored in the Hugging Face cache. The HF_HOME environment variable defines its location and can be overridden, e.g.:

export HF_HOME=$(pwd)/cache

Alternatively, use the following script to download the checkpoint weights locally and specify checkpoint path by -c checkpoint/rollingdepth-v1-0

bash script/download_weight.sh

🦿 Evaluation on test datasets

Coming soon

πŸ™ Acknowledgments

We thank Yue Pan, Shuchang Liu, Nando Metzger, and Nikolai Kalischek for fruitful discussions.

We are grateful to redmond.ai ([email protected]) for providing GPU resources.

🎫 License

This code of this work is licensed under the Apache License, Version 2.0 (as defined in the LICENSE).

The model is licensed under RAIL++-M License (as defined in the LICENSE-MODEL)

By downloading and using the code and model you agree to the terms in LICENSE and LICENSE-MODEL respectively.