DarwinLM is an evolutionary structured pruning method for large language models. It builds upon an evolutionary search process, generating multiple offspring models in each generation through mutation, and selecting the fittest for survival. This significantly reduces the computational costs of LLMs, especially for real-time applications.

Paper: https://arxiv.org/pdf/2502.07780
Code: https://github.com/IST-DASLab/DarwinLM
Models: DarwinLM-2.7B, DarwinLM-4.6B, DarwinLM-8.4B
Pruned Models without Post-training: DarwinLM-2.7B-Pruned, DarwinLM-4.6B-Pruned, DarwinLM-8.4B-Pruned


This repository contains the weights of DarwinLM, as introduced in our paper.

# Please add trust_remote_code=True as the repo includes custom code to load and run DarwinLM
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Shengkun/DarwinLM-2.7B-Pruned", trust_remote_code=True)

Downstream Tasks

2.7B

Method Param. SciQ PIQA WG ArcE ArcC HS LogiQA BoolQ Avg
Dense 6.7B 93.7 78.1 69.3 76.4 53.0 78.6 30.7 77.7 69.2
Uniform 3.4B 44.1 57.1 53.3 33.5 32.2 27.3 25.0 49.0 40.1
ZipLM 4.0B 87.4 64.4 58.3 53.2 33.6 50.1 25.5 63.6 54.5
ShearedLLama 2.7B 84.5 66.4 53.4 49.8 28.4 47.6 27.6 50.9 51.0
DarwinLM (one-shot) 2.7B 85.6 70.8 55.8 63.3 38.1 53.2 28.5 62.7 57.2
ShearedLLama (50B) 2.7B 90.8 75.8 64.2 67.0 41.2 70.8 28.2 63.0 62.6
ShearedLLama (10B†) 2.7B 92.0 73.6 63.1 69.8 42.0 64.4 29.0 62.1 61.9
DarwinLM (10B) 2.6B 90.8 72.2 65.1 68.5 45.0 67.2 28.5 64.6 62.8

4.6B

Model Method Param. SciQ PIQA WG ArcE ArcC HS LogiQA BoolQ MMLU Avg
Llama-3.1-8B Dense 8B 96.3 81.2 74.3 81.4 58.2 81.7 31.1 84.0 65.2 72.8
Uniform 4.5B 29.1 53.6 51.7 26.0 23.6 27.1 25.5 62.1 25.7 36.1
ZipLM 6B 65.5 60.6 56.0 40.2 34.4 34.4 28.1 63.0 27.9 45.7
DarwinLM (one-shot) 4.6B 84.9 69.4 57.3 59.6 34.2 44.6 24.1 62.2 28.5 51.6
OLMO (2.5T) 7B 92.8 79.4 70.4 73.3 44.9 77.1 27.9 72.5 28.3 62.9
DarwinLM (10.0B) 4.6B 93.2 74.8 67.4 73.2 51.6 71.3 30.7 71.1 40.6 63.7

8.4B

Model Method Param. SciQ PIQA WG ArcE ArcC HS LogiQA BoolQ MMLU Avg
Qwen-2.5-14B-Instruct Dense 14B 96.8 81.9 79.1 85.7 72.8 85.1 38.5 87.9 80.0 78.6
Uniform 8.6B 78.2 72.7 57.6 76.1 45.6 47.0 28.1 61.6 45.5 56.9
ZipLM 8.5B 69.0 66.4 52.8 60.1 38.3 43.3 29.6 60.2 25.0 49.4
DarwinLM (one-shot) 8.4B 84.3 73.9 60.5 75.7 48.0 53.3 29.3 66.9 43.1 59.4
OLMO-0424 (2.05T) 7B 96.1 80.1 72.1 73.8 49.2 78.0 29.3 80.8 52.1 67.9
DarwinLM (10.0B) 8.4B 89.5 78.1 70.7 79.6 57.6 74.9 33.5 73.9 57.9 68.4

Installation

conda env create -f environment.yml
conda activate darwinlm

Database Preparation

# For llama-2-7B
bash scripts/ziplm_llama2-7B.sh
# ... other model examples

Evolutionary Search

bash scripts/struct_prune_search.sh

Post-Training

After pruning, you can further fine-tune the model with the Fineweb-edu dataset using the llm-foundry repository. Refer to our paper for parameter settings.

Evaluation

Install the lm-evaluation-harness.

Option 1: Using pre-trained weights:

bash scripts/run_lmeval_hf.sh

Option 2: Evaluating your searched structure:

bash scripts/run_lmeval_config.sh

Bibtex

@article{tang2025darwinlm,
  title={DarwinLM: Evolutionary Structured Pruning of Large Language Models},
  author={Tang, Shengkun and Sieberling, Oliver and Kurtic, Eldar and Shen, Zhiqiang and Alistarh, Dan},
  journal={arXiv preprint arXiv:2502.07780},
  year={2025}
}
Downloads last month
20
Safetensors
Model size
2.74B params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.