Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2303.15105

Papers - Image - Attention - Window

Vision Transformer with Quadrangle Attention

Paper • 2303.15105 • Published Mar 27, 2023 • 2
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Paper • 2103.14030 • Published Mar 25, 2021 • 4
MAFormer: A Transformer Network with Multi-scale Attention Fusion for Visual Recognition

Paper • 2209.01620 • Published Aug 31, 2022 • 2
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows

Paper • 2107.00652 • Published Jul 1, 2021 • 2

Papers - QFormer

Vision Transformer with Quadrangle Attention

Paper • 2303.15105 • Published Mar 27, 2023 • 2
Language Grounded QFormer for Efficient Vision Language Understanding

Paper • 2311.07449 • Published Nov 13, 2023 • 2
MultiBooth: Towards Generating All Your Concepts in an Image from Text

Paper • 2404.14239 • Published Apr 22 • 8

FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation

Paper • 2403.06775 • Published Mar 11 • 3
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Paper • 2010.11929 • Published Oct 22, 2020 • 7
Data Incubation -- Synthesizing Missing Data for Handwriting Recognition

Paper • 2110.07040 • Published Oct 13, 2021 • 2
A Mixture of Expert Approach for Low-Cost Customization of Deep Neural Networks

Paper • 1811.00056 • Published Oct 31, 2018 • 2

Papers - Training Research

Measuring the Effects of Data Parallelism on Neural Network Training

Paper • 1811.03600 • Published Nov 8, 2018 • 2
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

Paper • 1804.04235 • Published Apr 11, 2018 • 2
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Paper • 1905.11946 • Published May 28, 2019 • 3
Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7 • 62

Papers - Attention

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Paper • 2402.10644 • Published Feb 16 • 79
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 5
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition

Paper • 2402.15220 • Published Feb 23 • 19
Sequence Parallelism: Long Sequence Training from System Perspective

Paper • 2105.13120 • Published May 26, 2021 • 5

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs