-
Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment
Paper • 2405.03594 • Published • 7 -
Sparse Finetuning for Inference Acceleration of Large Language Models
Paper • 2310.06927 • Published • 14 -
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Paper • 2301.00774 • Published • 3 -
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models
Paper • 2203.07259 • Published • 3
Collections
Discover the best community collections!
Collections including paper arxiv:2310.06927
-
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
Paper • 2310.17157 • Published • 12 -
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
Paper • 2305.15805 • Published • 1 -
Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM Inference with Transferable Prompt
Paper • 2305.11186 • Published • 1 -
Composable Sparse Fine-Tuning for Cross-Lingual Transfer
Paper • 2110.07560 • Published • 1
-
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
Paper • 2310.08659 • Published • 24 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 44 -
ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers
Paper • 2309.16119 • Published • 1 -
LoRA ensembles for large language model fine-tuning
Paper • 2310.00035 • Published • 2
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 145 -
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 117 -
Fine-tuning Language Models for Factuality
Paper • 2311.08401 • Published • 28 -
NEFTune: Noisy Embeddings Improve Instruction Finetuning
Paper • 2310.05914 • Published • 14
-
Sparse Finetuning for Inference Acceleration of Large Language Models
Paper • 2310.06927 • Published • 14 -
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Paper • 2301.00774 • Published • 3 -
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models
Paper • 2203.07259 • Published • 3 -
How Well Do Sparse Imagenet Models Transfer?
Paper • 2111.13445 • Published • 1