MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression Paper • 2406.14909 • Published Jun 21 • 14
FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAs Paper • 2401.03868 • Published Jan 8 • 1
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression Paper • 2406.14909 • Published Jun 21 • 14