Low-Rank Adapters Meet Neural Architecture Search for LLM Compression Paper • 2501.16372 • Published Jan 23 • 9
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models Paper • 2501.16937 • Published Jan 28 • 5
Identifying Sensitive Weights via Post-quantization Integral Paper • 2503.01901 • Published 13 days ago • 7
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test Paper • 2503.01840 • Published 10 days ago • 4
SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models Paper • 2503.07605 • Published 3 days ago • 62
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs Paper • 2503.07067 • Published 3 days ago • 27
Efficient Distillation of Classifier-Free Guidance using Adapters Paper • 2503.07274 • Published 3 days ago • 4
RayFlow: Instance-Aware Diffusion Acceleration via Adaptive Flow Trajectories Paper • 2503.07699 • Published 3 days ago • 5