Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published Feb 16 • 151
LeMo: Enabling LEss Token Involvement for MOre Context Fine-tuning Paper • 2501.09767 • Published Jan 15 • 2
view article Article Perceiver IO: a scalable, fully-attentional model that works on any modality Dec 15, 2021 • 7