LiteStage: Latency-aware Layer Skipping for Multi-stage Reasoning Paper • 2510.14211 • Published Oct 16 • 7
Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning Paper • 2505.13866 • Published May 20 • 17
FastKV: KV Cache Compression for Fast Long-Context Processing with Token-Selective Propagation Paper • 2502.01068 • Published Feb 3 • 18
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks Paper • 2402.09025 • Published Feb 14, 2024 • 9