RavenK commited on
Commit
2adb382
1 Parent(s): 2f58786

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +77 -0
README.md ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+ # TAC depth encoder
5
+
6
+ <!-- Provide a quick summary of what the model is/does. -->
7
+
8
+ This model is used for encoding a depth image into a dense feature.
9
+
10
+ ## Model Details
11
+
12
+ ### Model Description
13
+
14
+ <!-- Provide a longer summary of what this model is. -->
15
+
16
+ The model is pre-trained with RGB-D contrastive objectives, named TAC.
17
+ Different from InfoNCE-based loss fuctions, TAC leverages the similarity between videos frames and estimate a similarity matrix as soft labels.
18
+ The backbone of this version is ViT-B/32.
19
+ The pre-training is conducted on a new unified RGB-D database, UniRGBD.
20
+
21
+ ### Model Sources
22
+
23
+ <!-- Provide the basic links for the model. -->
24
+
25
+ - **Repository:** [TAC](https://github.com/RavenKiller/TAC)
26
+ - **Paper:** [Learning Depth Representation from RGB-D Videos by Time-Aware Contrastive Pre-training](https://ieeexplore.ieee.org/document/10288539)
27
+
28
+ ## Uses
29
+
30
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
31
+
32
+ ### Direct Uses
33
+
34
+ ```
35
+ from transformers import CLIPImageProcessor, CLIPVisionModel, CLIPVisionConfig
36
+ import numpy as np
37
+ tac_depth_model = CLIPVisionModel.from_pretrained("RavenK/TAC-ViT-base")
38
+ tac_depth_processor = CLIPImageProcessor.from_pretrained("RavenK/TAC-ViT-base")
39
+
40
+ # Assume test.png is a depth image with a scale factor 1000
41
+ MIN_DEPTH = 0.0
42
+ MAX_DEPTH = 10.0
43
+ DEPTH_SCALE = 1000
44
+
45
+ depth_path = "test.png"
46
+ depth = Image.open(depth_path)
47
+ depth = np.array(depth).astype("float32") / DEPTH_SCALE # to meters
48
+ depth = np.clip(depth, MIN_DEPTH, MAX_DEPTH) # clip to [MIN_DEPTH, MAX_DEPTH]
49
+ depth = (depth - MIN_DEPTH) / (MAX_DEPTH - MIN_DEPTH) # normalize to [0,1]
50
+ depth = np.expand_dims(depth, axis=2).repeat(3, axis=2) # extend to 3 channels
51
+ depth = tac_depth_processor(depth, do_rescale=False, return_tensors="pt").pixel_values # preprocess (resize, normalize and to tensor)
52
+
53
+ outputs = tac_depth_model(pixel_values=depth)
54
+ outputs = outputs["last_hidden_state"][:, 0, :] # get embedding without FC. may be used for other downstream fine-tuning
55
+ ```
56
+
57
+ ### Other Uses
58
+
59
+ Please refer to our code repository to get more details.
60
+
61
+ ## Citation
62
+
63
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
64
+
65
+ ```
66
+ @ARTICLE{10288539,
67
+ author={He, Zongtao and Wang, Liuyi and Dang, Ronghao and Li, Shu and Yan, Qingqing and Liu, Chengju and Chen, Qijun},
68
+ journal={IEEE Transactions on Circuits and Systems for Video Technology},
69
+ title={Learning Depth Representation from RGB-D Videos by Time-Aware Contrastive Pre-training},
70
+ year={2023},
71
+ volume={},
72
+ number={},
73
+ pages={1-1},
74
+ doi={10.1109/TCSVT.2023.3326373}}
75
+ ```
76
+
77
+