add checkpoint
Browse files- README.md +58 -0
- omnidata_normal_dpt_hybrid.pth +3 -0
README.md
CHANGED
@@ -1,3 +1,61 @@
|
|
1 |
---
|
2 |
license: cc-by-nc-4.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: cc-by-nc-4.0
|
3 |
---
|
4 |
+
|
5 |
+
|
6 |
+
<div align="center">
|
7 |
+
|
8 |
+
# Omnidata (Steerable Datasets)
|
9 |
+
**A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans (ICCV 2021)**
|
10 |
+
|
11 |
+
|
12 |
+
[`Project Website`](https://omnidata.vision) · [`Paper`](https://arxiv.org/abs/2110.04994) · [**`>> [Github] <<`**](https://github.com/EPFL-VILAB/omnidata#readme) · [`Data`](https://github.com/EPFL-VILAB/omnidata/tree/main/omnidata_tools/dataset#readme) · [`Pretrained Weights`](https://github.com/EPFL-VILAB/omnidata-tools/tree/main/omnidata_tools/torch#readme) · [`Annotator`](https://github.com/EPFL-VILAB/omnidata-tools/tree/main/omnidata_annotator#readme) ·
|
13 |
+
|
14 |
+
</div>
|
15 |
+
|
16 |
+
|
17 |
+
# DPT-Hybrid trained for surface normal estimation or depth estimation
|
18 |
+
Vision Transformer (ViT) model trained using a DPT (Dense Prediction Transformer) decoder.
|
19 |
+
|
20 |
+
|
21 |
+
## Intended uses & limitations
|
22 |
+
You can use this model for monocular surface normal estimation or depth estimation.
|
23 |
+
* Normal: estimates surface normals, a unit vector representing the tangent plane of the surface at each pixel.
|
24 |
+
* Depth: estimates normalized depth, a relative depth rather then metric depth.
|
25 |
+
|
26 |
+
|
27 |
+
## Models
|
28 |
+
Models to estimate surface depth from RGB images.
|
29 |
+
* Architecture: [DPT](https://github.com/isl-org/DPT)
|
30 |
+
* Training resolutions: 384x384
|
31 |
+
* Training data: [Omnidate dataset](https://github.com/EPFL-VILAB/omnidata/tree/main)
|
32 |
+
* Input:
|
33 |
+
* Dimensions: 384x384
|
34 |
+
* Normalization: (normals: [0, 1], depth: [-1,1])
|
35 |
+
|
36 |
+
|
37 |
+
### BibTeX entry and citation info
|
38 |
+
|
39 |
+
```bibtex
|
40 |
+
@inproceedings{eftekhar2021omnidata,
|
41 |
+
title={Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets From 3D Scans},
|
42 |
+
author={Eftekhar, Ainaz and Sax, Alexander and Malik, Jitendra and Zamir, Amir},
|
43 |
+
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
|
44 |
+
pages={10786--10796},
|
45 |
+
year={2021}
|
46 |
+
}
|
47 |
+
```
|
48 |
+
|
49 |
+
In case you use our latest pretrained models please also cite the following paper for 3D data augmentations:
|
50 |
+
|
51 |
+
```bibtex
|
52 |
+
@inproceedings{kar20223d,
|
53 |
+
title={3D Common Corruptions and Data Augmentation},
|
54 |
+
author={Kar, O{\u{g}}uzhan Fatih and Yeo, Teresa and Atanov, Andrei and Zamir, Amir},
|
55 |
+
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
|
56 |
+
pages={18963--18974},
|
57 |
+
year={2022}
|
58 |
+
}
|
59 |
+
```
|
60 |
+
<!-- <img src="https://raw.githubusercontent.com/alexsax/omnidata-tools/main/docs/images/omnidata_front_page.jpg?token=ABHLE3LC3U64F2QRVSOBSS3BPED24" alt="Website main page" style='max-width: 100%;'/> -->
|
61 |
+
> ...were you looking for the [research paper](//omnidata.vision/#paper) or [project website](//omnidata.vision)?
|
omnidata_normal_dpt_hybrid.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1af4506385ef4c828af559309ec89428833c005d7ecbcf921c4b12f84c2f62df
|
3 |
+
size 492716590
|