MobA: A Two-Level Agent System for Efficient Mobile Task Automation Paper • 2410.13757 • Published Oct 17, 2024 • 32
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters Paper • 2406.05955 • Published Jun 10, 2024 • 23
ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse LLMs Paper • 2402.03804 • Published Feb 6, 2024 • 2
PowerInfer-2: Fast Large Language Model Inference on a Smartphone Paper • 2406.06282 • Published Jun 10, 2024 • 36
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU Paper • 2312.12456 • Published Dec 16, 2023 • 40