Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2305.18403

Papers: Pruning

LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery

Paper • 2310.18356 • Published Oct 24, 2023 • 22
LoRAPrune: Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning

Paper • 2305.18403 • Published May 28, 2023 • 2
Parameter-Efficient Fine-Tuning with Layer Pruning on Free-Text Sequence-to-Sequence Modeling

Paper • 2305.08285 • Published May 15, 2023 • 1
A Comparative Analysis of Task-Agnostic Distillation Methods for Compressing Transformer Language Models

Paper • 2310.08797 • Published Oct 13, 2023 • 1

Papers: Quantization

FP8-LM: Training FP8 Large Language Models

Paper • 2310.18313 • Published Oct 27, 2023 • 33
LLM-FP4: 4-Bit Floating-Point Quantized Transformers

Paper • 2310.16836 • Published Oct 25, 2023 • 13
TEQ: Trainable Equivalent Transformation for Quantization of LLMs

Paper • 2310.10944 • Published Oct 17, 2023 • 9
ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers

Paper • 2309.16119 • Published Sep 28, 2023 • 1

Papers related to parameter efficient finetuning methods.

LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery

Paper • 2310.18356 • Published Oct 24, 2023 • 22
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models

Paper • 2310.08659 • Published Oct 12, 2023 • 25
ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers

Paper • 2309.16119 • Published Sep 28, 2023 • 1
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Paper • 2309.14717 • Published Sep 26, 2023 • 44

Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time

Paper • 2310.17157 • Published Oct 26, 2023 • 12
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers

Paper • 2305.15805 • Published May 25, 2023 • 1
Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM Inference with Transferable Prompt

Paper • 2305.11186 • Published May 17, 2023 • 1
Composable Sparse Fine-Tuning for Cross-Lingual Transfer

Paper • 2110.07560 • Published Oct 14, 2021 • 1

LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models

Paper • 2310.08659 • Published Oct 12, 2023 • 25
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Paper • 2309.14717 • Published Sep 26, 2023 • 44
ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers

Paper • 2309.16119 • Published Sep 28, 2023 • 1
LoRA ensembles for large language model fine-tuning

Paper • 2310.00035 • Published Sep 29, 2023 • 2

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs