Shengkun/DarwinLM-2.7B-Pruned

DarwinLM is an evolutionary structured pruning method for large language models. It builds upon an evolutionary search process, generating multiple offspring models in each generation through mutation, and selecting the fittest for survival. This significantly reduces the computational costs of LLMs, especially for real-time applications.

Paper: https://arxiv.org/pdf/2502.07780
Code: https://github.com/IST-DASLab/DarwinLM
Models: DarwinLM-2.7B, DarwinLM-4.6B, DarwinLM-8.4B
Pruned Models without Post-training: DarwinLM-2.7B-Pruned, DarwinLM-4.6B-Pruned, DarwinLM-8.4B-Pruned

This repository contains the weights of DarwinLM, as introduced in our paper.

# Please add trust_remote_code=True as the repo includes custom code to load and run DarwinLM
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Shengkun/DarwinLM-2.7B-Pruned", trust_remote_code=True)

Downstream Tasks

2.7B

Method	Param.	SciQ	PIQA	WG	ArcE	ArcC	HS	LogiQA	BoolQ	Avg
Dense	6.7B	93.7	78.1	69.3	76.4	53.0	78.6	30.7	77.7	69.2
Uniform	3.4B	44.1	57.1	53.3	33.5	32.2	27.3	25.0	49.0	40.1
ZipLM	4.0B	87.4	64.4	58.3	53.2	33.6	50.1	25.5	63.6	54.5
ShearedLLama	2.7B	84.5	66.4	53.4	49.8	28.4	47.6	27.6	50.9	51.0
DarwinLM (one-shot)	2.7B	85.6	70.8	55.8	63.3	38.1	53.2	28.5	62.7	57.2
ShearedLLama (50B)	2.7B	90.8	75.8	64.2	67.0	41.2	70.8	28.2	63.0	62.6
ShearedLLama (10B†)	2.7B	92.0	73.6	63.1	69.8	42.0	64.4	29.0	62.1	61.9
DarwinLM (10B)	2.6B	90.8	72.2	65.1	68.5	45.0	67.2	28.5	64.6	62.8

4.6B

Model	Method	Param.	SciQ	PIQA	WG	ArcE	ArcC	HS	LogiQA	BoolQ	MMLU	Avg
Llama-3.1-8B	Dense	8B	96.3	81.2	74.3	81.4	58.2	81.7	31.1	84.0	65.2	72.8
	Uniform	4.5B	29.1	53.6	51.7	26.0	23.6	27.1	25.5	62.1	25.7	36.1
	ZipLM	6B	65.5	60.6	56.0	40.2	34.4	34.4	28.1	63.0	27.9	45.7
	DarwinLM (one-shot)	4.6B	84.9	69.4	57.3	59.6	34.2	44.6	24.1	62.2	28.5	51.6
	OLMO (2.5T)	7B	92.8	79.4	70.4	73.3	44.9	77.1	27.9	72.5	28.3	62.9
	DarwinLM (10.0B)	4.6B	93.2	74.8	67.4	73.2	51.6	71.3	30.7	71.1	40.6	63.7

8.4B

Model	Method	Param.	SciQ	PIQA	WG	ArcE	ArcC	HS	LogiQA	BoolQ	MMLU	Avg
Qwen-2.5-14B-Instruct	Dense	14B	96.8	81.9	79.1	85.7	72.8	85.1	38.5	87.9	80.0	78.6
	Uniform	8.6B	78.2	72.7	57.6	76.1	45.6	47.0	28.1	61.6	45.5	56.9
	ZipLM	8.5B	69.0	66.4	52.8	60.1	38.3	43.3	29.6	60.2	25.0	49.4
	DarwinLM (one-shot)	8.4B	84.3	73.9	60.5	75.7	48.0	53.3	29.3	66.9	43.1	59.4
	OLMO-0424 (2.05T)	7B	96.1	80.1	72.1	73.8	49.2	78.0	29.3	80.8	52.1	67.9
	DarwinLM (10.0B)	8.4B	89.5	78.1	70.7	79.6	57.6	74.9	33.5	73.9	57.9	68.4

Installation

conda env create -f environment.yml
conda activate darwinlm

Database Preparation

# For llama-2-7B
bash scripts/ziplm_llama2-7B.sh
# ... other model examples

Evolutionary Search

bash scripts/struct_prune_search.sh

Post-Training

After pruning, you can further fine-tune the model with the Fineweb-edu dataset using the llm-foundry repository. Refer to our paper for parameter settings.

Evaluation

Install the lm-evaluation-harness.

Option 1: Using pre-trained weights:

bash scripts/run_lmeval_hf.sh

Option 2: Evaluating your searched structure:

bash scripts/run_lmeval_config.sh

Bibtex

@article{tang2025darwinlm,
  title={DarwinLM: Evolutionary Structured Pruning of Large Language Models},
  author={Tang, Shengkun and Sieberling, Oliver and Kurtic, Eldar and Shen, Zhiqiang and Alistarh, Dan},
  journal={arXiv preprint arXiv:2502.07780},
  year={2025}
}