sashasax's picture
add checkpoint
387d2be
|
raw
history blame
2.73 kB
metadata
license: cc-by-nc-4.0

Omnidata (Steerable Datasets)

A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans (ICCV 2021)

Project WebsitePaper>> [Github] <<DataPretrained WeightsAnnotator

DPT-Hybrid trained for surface normal estimation or depth estimation

Vision Transformer (ViT) model trained using a DPT (Dense Prediction Transformer) decoder.

Intended uses & limitations

You can use this model for monocular surface normal estimation or depth estimation.

  • Normal: estimates surface normals, a unit vector representing the tangent plane of the surface at each pixel.
  • Depth: estimates normalized depth, a relative depth rather then metric depth.

Models

Models to estimate surface depth from RGB images.

  • Architecture: DPT
  • Training resolutions: 384x384
  • Training data: Omnidate dataset
  • Input:
    • Dimensions: 384x384
    • Normalization: (normals: [0, 1], depth: [-1,1])

BibTeX entry and citation info

@inproceedings{eftekhar2021omnidata,
  title={Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets From 3D Scans},
  author={Eftekhar, Ainaz and Sax, Alexander and Malik, Jitendra and Zamir, Amir},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={10786--10796},
  year={2021}
}

In case you use our latest pretrained models please also cite the following paper for 3D data augmentations:

@inproceedings{kar20223d,
  title={3D Common Corruptions and Data Augmentation},
  author={Kar, O{\u{g}}uzhan Fatih and Yeo, Teresa and Atanov, Andrei and Zamir, Amir},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={18963--18974},
  year={2022}
}

...were you looking for the research paper or project website?