TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval Paper • 2502.20969 • Published Feb 28 • 10
LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation Paper • 2502.20583 • Published Feb 27 • 12
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models Paper • 2402.07033 • Published Feb 10, 2024 • 17