GameArena: Evaluating LLM Reasoning through Live Computer Games Paper • 2412.06394 • Published Dec 9, 2024
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization Paper • 2406.05981 • Published Jun 10, 2024 • 15
When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models Paper • 2406.07368 • Published Jun 11, 2024 • 2
Efficiently Serving LLM Reasoning Programs with Certaindex Paper • 2412.20993 • Published Dec 30, 2024 • 36
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding Paper • 2402.02057 • Published Feb 3, 2024
PockEngine: Sparse and Efficient Fine-tuning in a Pocket Paper • 2310.17752 • Published Oct 26, 2023 • 14
Optimizing Speculative Decoding for Serving Large Language Models Using Goodput Paper • 2406.14066 • Published Jun 20, 2024 • 2