🛹 RollingDepth: Video Depth without Video Models

This repository represents the official implementation of the paper titled "Video Depth without Video Models".

Bingxin Ke¹, Dominik Narnhofer¹, Shengyu Huang¹, Lei Ke², Torben Peters¹, Katerina Fragkiadaki², Anton Obukhov¹, Konrad Schindler¹

¹ETH Zurich, ²Carnegie Mellon University

📢 News

2024-11-28: Inference code is released.

🛠️ Setup

The inference code was tested on: Debian 12, Python 3.12.7 (venv), CUDA 12.4, GeForce RTX 3090

📦 Repository

git clone https://github.com/prs-eth/RollingDepth.git
cd RollingDepth

🐍 Python environment

Create python environment:

# with venv
python -m venv venv/rollingdepth
source venv/rollingdepth/bin/activate

# or with conda
conda create --name rollingdepth python=3.12
conda activate rollingdepth

💻 Dependencies

Install dependicies:

pip install -r requirements.txt

# Install modified diffusers with cross-frame self-attention
bash script/install_diffusers_dev.sh

We use pyav for video I/O, which relies on ffmpeg.

🏃 Test on your videos

All scripts are designed to run from the project root directory.

📷 Prepare input videos

Use sample videos:
```
bash script/download_sample_data.sh
```
Or place your videos in a directory, for example, under data/samples.

🚀 Run with presets

python run_video.py \
    -i data/samples \
    -o output/samples_fast \
    -p fast \
    --save-npy true \
    --verbose

-p or --preset: preset options
- fast for fast inference, with dilations [1, 25] (flexible), fp16, without refinement, at max. resolution 768.
- fast1024 for fast inference at resolution 1024
- full for better details, with dilations [1, 10, 25] (flexible), fp16, with 10 refinement steps, at max. resolution 1024.
- paper for reproducing paper numbers, with (fixed) dilations [1, 10, 25], fp32, with 10 refinement steps, at max. resolution 768.
-i or --input-video: path to input data, can be a single video file, a text file with video paths, or a directory of videos.
-o or --output-dir: output directory.

Passing other arguments below may overwrite the preset settings:

Coming soon

⬇ Checkpoint cache

By default, the checkpoint is stored in the Hugging Face cache. The HF_HOME environment variable defines its location and can be overridden, e.g.:

export HF_HOME=$(pwd)/cache

Alternatively, use the following script to download the checkpoint weights locally and specify checkpoint path by -c checkpoint/rollingdepth-v1-0

bash script/download_weight.sh

🦿 Evaluation on test datasets

Coming soon

🙏 Acknowledgments

We thank Yue Pan, Shuchang Liu, Nando Metzger, and Nikolai Kalischek for fruitful discussions.

We are grateful to redmond.ai ([email protected]) for providing GPU resources.

🎫 License

This code of this work is licensed under the Apache License, Version 2.0 (as defined in the LICENSE).

The model is licensed under RAIL++-M License (as defined in the LICENSE-MODEL)

By downloading and using the code and model you agree to the terms in LICENSE and LICENSE-MODEL respectively.