You Do Not Fully Utilize Transformer's Representation Capacity Paper • 2502.09245 • Published Feb 13 • 34
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published about 1 month ago • 145
ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM Paper • 2408.12076 • Published Aug 22, 2024 • 12