vlbthambawita
's Collections
Transformer-based Models for Computer Vision
updated
MIO: A Foundation Model on Multimodal Tokens
Paper
•
2409.17692
•
Published
•
53
An Image is Worth 16x16 Words: Transformers for Image Recognition at
Scale
Paper
•
2010.11929
•
Published
•
7
Going deeper with Image Transformers
Paper
•
2103.17239
•
Published
Training data-efficient image transformers & distillation through
attention
Paper
•
2012.12877
•
Published
•
2
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Paper
•
2103.14030
•
Published
•
4
Masked Autoencoders Are Scalable Vision Learners
Paper
•
2111.06377
•
Published
•
3
DINOv2: Learning Robust Visual Features without Supervision
Paper
•
2304.07193
•
Published
•
5
Emerging Properties in Self-Supervised Vision Transformers
Paper
•
2104.14294
•
Published
•
3
BEiT: BERT Pre-Training of Image Transformers
Paper
•
2106.08254
•
Published
•
2
Learning Transferable Visual Models From Natural Language Supervision
Paper
•
2103.00020
•
Published
•
11
How to train your ViT? Data, Augmentation, and Regularization in Vision
Transformers
Paper
•
2106.10270
•
Published
•
3
Biomedical SAM 2: Segment Anything in Biomedical Images and Videos
Paper
•
2408.03286
•
Published
SAM 2: Segment Anything in Images and Videos
Paper
•
2408.00714
•
Published
•
113