Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems Paper • 2312.15234 • Published Dec 23, 2023 • 3
FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning Paper • 2402.18789 • Published Feb 29, 2024
Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models Paper • 2401.07159 • Published Jan 13, 2024
SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification Paper • 2305.09781 • Published May 16, 2023 • 4