inference optimization - a zzfive Collection

zzfive 's Collections

safety

inference optimization

RL+reason model

medical

3d

image

LLMs

video

agent

cv

audio

robot

inference optimization

updated about 4 hours ago

Low-Rank Adapters Meet Neural Architecture Search for LLM Compression

Paper • 2501.16372 • Published Jan 23 • 9
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models

Paper • 2501.16937 • Published Jan 28 • 5
Matryoshka Quantization

Paper • 2502.06786 • Published about 1 month ago • 29
Identifying Sensitive Weights via Post-quantization Integral

Paper • 2503.01901 • Published 13 days ago • 7
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test

Paper • 2503.01840 • Published 10 days ago • 4
SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models

Paper • 2503.07605 • Published 3 days ago • 62
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs

Paper • 2503.07067 • Published 3 days ago • 27
Efficient Distillation of Classifier-Free Guidance using Adapters

Paper • 2503.07274 • Published 3 days ago • 4
RayFlow: Instance-Aware Diffusion Acceleration via Adaptive Flow Trajectories

Paper • 2503.07699 • Published 3 days ago • 5