PVTv2

This is the Hugging Face PyTorch implementation of the PVTv2 model.

Model Description

The Pyramid Vision Transformer v2 (PVTv2) is a powerful, lightweight hierarchical transformer backbone for vision tasks. PVTv2 infuses convolution operations into its transformer layers to infuse properties of CNNs that enable them to learn image data efficiently. This mix transformer architecture requires no added positional embeddings, and produces multi-scale feature maps which are known to be beneficial for dense and fine-grained prediction tasks.

Vision models using PVTv2 for a backbone:

  1. Segformer for Semantic Segmentation.
  2. GLPN for Monocular Depth.
  3. Deformable DETR for 2D Object Detection.
  4. Panoptic Segformer for Panoptic Segmentation.
Downloads last month
57
Safetensors
Model size
45.2M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including OpenGVLab/pvt_v2_b3