File size: 2,727 Bytes
81723c2 387d2be |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
---
license: cc-by-nc-4.0
---
<div align="center">
# Omnidata (Steerable Datasets)
**A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans (ICCV 2021)**
[`Project Website`](https://omnidata.vision) · [`Paper`](https://arxiv.org/abs/2110.04994) · [**`>> [Github] <<`**](https://github.com/EPFL-VILAB/omnidata#readme) · [`Data`](https://github.com/EPFL-VILAB/omnidata/tree/main/omnidata_tools/dataset#readme) · [`Pretrained Weights`](https://github.com/EPFL-VILAB/omnidata-tools/tree/main/omnidata_tools/torch#readme) · [`Annotator`](https://github.com/EPFL-VILAB/omnidata-tools/tree/main/omnidata_annotator#readme) ·
</div>
# DPT-Hybrid trained for surface normal estimation or depth estimation
Vision Transformer (ViT) model trained using a DPT (Dense Prediction Transformer) decoder.
## Intended uses & limitations
You can use this model for monocular surface normal estimation or depth estimation.
* Normal: estimates surface normals, a unit vector representing the tangent plane of the surface at each pixel.
* Depth: estimates normalized depth, a relative depth rather then metric depth.
## Models
Models to estimate surface depth from RGB images.
* Architecture: [DPT](https://github.com/isl-org/DPT)
* Training resolutions: 384x384
* Training data: [Omnidate dataset](https://github.com/EPFL-VILAB/omnidata/tree/main)
* Input:
* Dimensions: 384x384
* Normalization: (normals: [0, 1], depth: [-1,1])
### BibTeX entry and citation info
```bibtex
@inproceedings{eftekhar2021omnidata,
title={Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets From 3D Scans},
author={Eftekhar, Ainaz and Sax, Alexander and Malik, Jitendra and Zamir, Amir},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={10786--10796},
year={2021}
}
```
In case you use our latest pretrained models please also cite the following paper for 3D data augmentations:
```bibtex
@inproceedings{kar20223d,
title={3D Common Corruptions and Data Augmentation},
author={Kar, O{\u{g}}uzhan Fatih and Yeo, Teresa and Atanov, Andrei and Zamir, Amir},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={18963--18974},
year={2022}
}
```
<!-- <img src="https://raw.githubusercontent.com/alexsax/omnidata-tools/main/docs/images/omnidata_front_page.jpg?token=ABHLE3LC3U64F2QRVSOBSS3BPED24" alt="Website main page" style='max-width: 100%;'/> -->
> ...were you looking for the [research paper](//omnidata.vision/#paper) or [project website](//omnidata.vision)?
|