TAC RGB encoder

This model is used for encoding RGB image into a dense feature.

Caution, the model does not contain the last FC layer. So, the output features are not aligned with depth.

Model Details

Model Description

The model is pre-trained with RGB-D contrastive objectives, named TAC. Different from InfoNCE-based loss fuctions, TAC leverages the similarity between videos frames and estimate a similarity matrix as soft labels. The backbone of this version is ViT-B/32. The pre-training is conducted on a new unified RGB-D database, UniRGBD. The main purpose of this work is depth representation. So, the RGB encoder is just a side model.

Model Sources

Citation

@ARTICLE{10288539,
  author={He, Zongtao and Wang, Liuyi and Dang, Ronghao and Li, Shu and Yan, Qingqing and Liu, Chengju and Chen, Qijun},
  journal={IEEE Transactions on Circuits and Systems for Video Technology}, 
  title={Learning Depth Representation From RGB-D Videos by Time-Aware Contrastive Pre-Training}, 
  year={2024},
  volume={34},
  number={6},
  pages={4143-4158},
  doi={10.1109/TCSVT.2023.3326373}}
Downloads last month
5
Safetensors
Model size
87.5M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.