# 🛹 RollingDepth: Video Depth without Video Models [![Website](doc/badges/badge-website.svg)](https://rollingdepth.github.io) [![Hugging Face Model](https://img.shields.io/badge/🤗%20Hugging%20Face-Model-green)](https://huggingface.co/prs-eth/rollingdepth-v1-0) This repository represents the official implementation of the paper titled "Video Depth without Video Models". [Bingxin Ke](http://www.kebingxin.com/)1, [Dominik Narnhofer](https://scholar.google.com/citations?user=tFx8AhkAAAAJ&hl=en)1, [Shengyu Huang](https://shengyuh.github.io/)1, [Lei Ke](https://www.kelei.site/)2, [Torben Peters](https://scholar.google.com/citations?user=F2C3I9EAAAAJ&hl=de)1, [Katerina Fragkiadaki](https://www.cs.cmu.edu/~katef/)2, [Anton Obukhov](https://www.obukhov.ai/)1, [Konrad Schindler](https://scholar.google.com/citations?user=FZuNgqIAAAAJ&hl=en)1 1ETH Zurich, 2Carnegie Mellon University ## 📢 News 2024-11-28: Inference code is released.
## 🛠️ Setup The inference code was tested on: Debian 12, Python 3.12.7 (venv), CUDA 12.4, GeForce RTX 3090 ### 📦 Repository ```bash git clone https://github.com/prs-eth/RollingDepth.git cd RollingDepth ``` ### 🐍 Python environment Create python environment: ```bash # with venv python -m venv venv/rollingdepth source venv/rollingdepth/bin/activate # or with conda conda create --name rollingdepth python=3.12 conda activate rollingdepth ``` ### 💻 Dependencies Install dependicies: ```bash pip install -r requirements.txt # Install modified diffusers with cross-frame self-attention bash script/install_diffusers_dev.sh ``` We use [pyav](https://github.com/PyAV-Org/PyAV) for video I/O, which relies on [ffmpeg](https://www.ffmpeg.org/). ## 🏃 Test on your videos All scripts are designed to run from the project root directory. ### 📷 Prepare input videos 1. Use sample videos: ```bash bash script/download_sample_data.sh ``` 1. Or place your videos in a directory, for example, under `data/samples`. ### 🚀 Run with presets ```bash python run_video.py \ -i data/samples \ -o output/samples_fast \ -p fast \ --save-npy true \ --verbose ``` - `-p` or `--preset`: preset options - `fast` for **fast inference**, with dilations [1, 25] (flexible), fp16, without refinement, at max. resolution 768. - `fast1024` for **fast inference at resolution 1024** - `full` for **better details**, with dilations [1, 10, 25] (flexible), fp16, with 10 refinement steps, at max. resolution 1024. - `paper` for **reproducing paper numbers**, with (fixed) dilations [1, 10, 25], fp32, with 10 refinement steps, at max. resolution 768. - `-i` or `--input-video`: path to input data, can be a single video file, a text file with video paths, or a directory of videos. - `-o` or `--output-dir`: output directory. Passing other arguments below may overwrite the preset settings: - Coming soon ## ⬇ Checkpoint cache By default, the [checkpoint](https://huggingface.co/prs-eth/rollingdepth-v1-0) is stored in the Hugging Face cache. The HF_HOME environment variable defines its location and can be overridden, e.g.: ``` export HF_HOME=$(pwd)/cache ``` Alternatively, use the following script to download the checkpoint weights locally and specify checkpoint path by `-c checkpoint/rollingdepth-v1-0 ` ```bash bash script/download_weight.sh ``` ## 🦿 Evaluation on test datasets Coming soon ## 🙏 Acknowledgments We thank Yue Pan, Shuchang Liu, Nando Metzger, and Nikolai Kalischek for fruitful discussions. We are grateful to [redmond.ai](https://redmond.ai/) (robin@redmond.ai) for providing GPU resources. ## 🎫 License This code of this work is licensed under the Apache License, Version 2.0 (as defined in the [LICENSE](LICENSE.txt)). The model is licensed under RAIL++-M License (as defined in the [LICENSE-MODEL](LICENSE-MODEL.txt)) By downloading and using the code and model you agree to the terms in [LICENSE](LICENSE.txt) and [LICENSE-MODEL](LICENSE-MODEL.txt) respectively.