sashasax commited on
Commit
387d2be
·
1 Parent(s): 81723c2

add checkpoint

Browse files
Files changed (2) hide show
  1. README.md +58 -0
  2. omnidata_normal_dpt_hybrid.pth +3 -0
README.md CHANGED
@@ -1,3 +1,61 @@
1
  ---
2
  license: cc-by-nc-4.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-nc-4.0
3
  ---
4
+
5
+
6
+ <div align="center">
7
+
8
+ # Omnidata (Steerable Datasets)
9
+ **A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans (ICCV 2021)**
10
+
11
+
12
+ [`Project Website`](https://omnidata.vision) &centerdot; [`Paper`](https://arxiv.org/abs/2110.04994) &centerdot; [**`>> [Github] <<`**](https://github.com/EPFL-VILAB/omnidata#readme) &centerdot; [`Data`](https://github.com/EPFL-VILAB/omnidata/tree/main/omnidata_tools/dataset#readme) &centerdot; [`Pretrained Weights`](https://github.com/EPFL-VILAB/omnidata-tools/tree/main/omnidata_tools/torch#readme) &centerdot; [`Annotator`](https://github.com/EPFL-VILAB/omnidata-tools/tree/main/omnidata_annotator#readme) &centerdot;
13
+
14
+ </div>
15
+
16
+
17
+ # DPT-Hybrid trained for surface normal estimation or depth estimation
18
+ Vision Transformer (ViT) model trained using a DPT (Dense Prediction Transformer) decoder.
19
+
20
+
21
+ ## Intended uses & limitations
22
+ You can use this model for monocular surface normal estimation or depth estimation.
23
+ * Normal: estimates surface normals, a unit vector representing the tangent plane of the surface at each pixel.
24
+ * Depth: estimates normalized depth, a relative depth rather then metric depth.
25
+
26
+
27
+ ## Models
28
+ Models to estimate surface depth from RGB images.
29
+ * Architecture: [DPT](https://github.com/isl-org/DPT)
30
+ * Training resolutions: 384x384
31
+ * Training data: [Omnidate dataset](https://github.com/EPFL-VILAB/omnidata/tree/main)
32
+ * Input:
33
+ * Dimensions: 384x384
34
+ * Normalization: (normals: [0, 1], depth: [-1,1])
35
+
36
+
37
+ ### BibTeX entry and citation info
38
+
39
+ ```bibtex
40
+ @inproceedings{eftekhar2021omnidata,
41
+ title={Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets From 3D Scans},
42
+ author={Eftekhar, Ainaz and Sax, Alexander and Malik, Jitendra and Zamir, Amir},
43
+ booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
44
+ pages={10786--10796},
45
+ year={2021}
46
+ }
47
+ ```
48
+
49
+ In case you use our latest pretrained models please also cite the following paper for 3D data augmentations:
50
+
51
+ ```bibtex
52
+ @inproceedings{kar20223d,
53
+ title={3D Common Corruptions and Data Augmentation},
54
+ author={Kar, O{\u{g}}uzhan Fatih and Yeo, Teresa and Atanov, Andrei and Zamir, Amir},
55
+ booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
56
+ pages={18963--18974},
57
+ year={2022}
58
+ }
59
+ ```
60
+ <!-- <img src="https://raw.githubusercontent.com/alexsax/omnidata-tools/main/docs/images/omnidata_front_page.jpg?token=ABHLE3LC3U64F2QRVSOBSS3BPED24" alt="Website main page" style='max-width: 100%;'/> -->
61
+ > ...were you looking for the [research paper](//omnidata.vision/#paper) or [project website](//omnidata.vision)?
omnidata_normal_dpt_hybrid.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1af4506385ef4c828af559309ec89428833c005d7ecbcf921c4b12f84c2f62df
3
+ size 492716590