DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference Paper • 2401.08671 • Published Jan 9, 2024 • 15
NanoFlow: Towards Optimal Large Language Model Serving Throughput Paper • 2408.12757 • Published Aug 22, 2024 • 17