Depth Anything V2 Estimator Block
A custom Modular Diffusers block for monocular depth estimation using Depth Anything V2. Supports both images and videos.
Features
- Relative depth estimation using Depth Anything V2 (Large variant, 335M params)
- Image and video input support
- Grayscale or turbo colormap visualization
Installation
# Using uv
uv sync
# Using pip
pip install -r requirements.txt
Quick Start
Load the block
from diffusers import ModularPipelineBlocks
import torch
blocks = ModularPipelineBlocks.from_pretrained(
"your-username/depth-anything-v2-estimator", # or local path "."
trust_remote_code=True,
)
pipeline = blocks.init_pipeline()
pipeline.load_components(torch_dtype=torch.float16)
pipeline.to("cuda")
Single image - grayscale depth
from PIL import Image
image = Image.open("photo.jpg")
output = pipeline(image=image)
# Save depth map
output.depth_image.save("photo_depth.png")
# Access raw relative depth tensor
print(output.predicted_depth.shape) # (H, W)
Single image - turbo colormap
output = pipeline(image=image, colormap="turbo")
output.depth_image.save("photo_depth_turbo.png")
Video - grayscale depth
from block import save_video
output = pipeline(video_path="input.mp4", colormap="grayscale")
save_video(output.depth_frames, output.fps, "output_depth.mp4")
Video - turbo colormap
output = pipeline(video_path="input.mp4", colormap="turbo")
save_video(output.depth_frames, output.fps, "output_depth_turbo.mp4")
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
image |
PIL.Image |
- | Image to estimate depth for |
video_path |
str |
- | Path to input video. When provided, image is ignored |
colormap |
str |
"grayscale" |
"grayscale" or "turbo" (colormapped) |
Outputs
Image mode
| Output | Type | Description |
|---|---|---|
depth_image |
PIL.Image |
Normalized depth visualization |
predicted_depth |
torch.Tensor |
Raw relative depth (H x W) |
Video mode
| Output | Type | Description |
|---|---|---|
depth_frames |
List[PIL.Image] |
Per-frame depth visualizations |
fps |
float |
Source video frame rate |
Depth Normalization
Depth values are min-max normalized and inverted so that bright areas represent nearby surfaces and dark areas represent distant ones.
- Bright = close, dark = far (grayscale)
- Warm (red/yellow) = close, cool (blue) = far (turbo)
Model Variants
The block defaults to depth-anything/Depth-Anything-V2-Large-hf. Other available variants:
| Variant | Model ID | Params |
|---|---|---|
| Small | depth-anything/Depth-Anything-V2-Small-hf |
24.8M |
| Base | depth-anything/Depth-Anything-V2-Base-hf |
97.5M |
| Large (default) | depth-anything/Depth-Anything-V2-Large-hf |
335M |
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support