File size: 2,727 Bytes
81723c2
 
 
387d2be
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
---

license: cc-by-nc-4.0
---



<div align="center">

# Omnidata (Steerable Datasets)
**A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans (ICCV 2021)**

  
[`Project Website`](https://omnidata.vision) &centerdot; [`Paper`](https://arxiv.org/abs/2110.04994) &centerdot; [**`>> [Github] <<`**](https://github.com/EPFL-VILAB/omnidata#readme) &centerdot; [`Data`](https://github.com/EPFL-VILAB/omnidata/tree/main/omnidata_tools/dataset#readme) &centerdot; [`Pretrained Weights`](https://github.com/EPFL-VILAB/omnidata-tools/tree/main/omnidata_tools/torch#readme) &centerdot; [`Annotator`](https://github.com/EPFL-VILAB/omnidata-tools/tree/main/omnidata_annotator#readme) &centerdot; 



</div>





# DPT-Hybrid trained for surface normal estimation or depth estimation

Vision Transformer (ViT) model trained using a DPT (Dense Prediction Transformer) decoder.





## Intended uses & limitations

You can use this model for monocular surface normal estimation or depth estimation. 

* Normal: estimates surface normals, a unit vector representing the tangent plane of the surface at each pixel.

* Depth: estimates normalized depth, a relative depth rather then metric depth.





## Models

Models to estimate surface depth from RGB images.

* Architecture: [DPT](https://github.com/isl-org/DPT)

* Training resolutions: 384x384

* Training data: [Omnidate dataset](https://github.com/EPFL-VILAB/omnidata/tree/main)

* Input:

  * Dimensions: 384x384

  * Normalization: (normals: [0, 1], depth: [-1,1])





### BibTeX entry and citation info



```bibtex

@inproceedings{eftekhar2021omnidata,

  title={Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets From 3D Scans},

  author={Eftekhar, Ainaz and Sax, Alexander and Malik, Jitendra and Zamir, Amir},

  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},

  pages={10786--10796},

  year={2021}

}

```



In case you use our latest pretrained models please also cite the following paper for 3D data augmentations:



```bibtex

@inproceedings{kar20223d,

  title={3D Common Corruptions and Data Augmentation},

  author={Kar, O{\u{g}}uzhan Fatih and Yeo, Teresa and Atanov, Andrei and Zamir, Amir},

  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},

  pages={18963--18974},

  year={2022}

}

```

<!-- <img src="https://raw.githubusercontent.com/alexsax/omnidata-tools/main/docs/images/omnidata_front_page.jpg?token=ABHLE3LC3U64F2QRVSOBSS3BPED24" alt="Website main page" style='max-width: 100%;'/> -->

> ...were you looking for the [research paper](//omnidata.vision/#paper) or [project website](//omnidata.vision)?