File size: 2,498 Bytes
5c1e649
 
ad62da0
5c1e649
 
 
 
ad62da0
 
 
5c1e649
 
ad62da0
5c1e649
 
 
 
 
 
 
ad62da0
 
 
 
5c1e649
 
 
 
 
ad62da0
 
5c1e649
 
 
 
 
 
 
ad62da0
5c1e649
 
 
ad62da0
5c1e649
 
 
 
ad62da0
5c1e649
ad62da0
5c1e649
ad62da0
5c1e649
ad62da0
5c1e649
ad62da0
5c1e649
 
 
 
 
 
 
 
 
ad62da0
5c1e649
 
 
 
 
 
ad62da0
 
 
 
 
 
 
 
5c1e649
 
 
ad62da0
5c1e649
 
 
ad62da0
5c1e649
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
library_name: transformers
tags: [Structured Pruning, Phi-2, Memory-efficient Pruning]
---

# Model Card for Model ID

We prune the Phi-2 (2.7B) model to 35% sparsty (1.8B) and then finetune on 100K 2048 length sequences from the C4 dataset (https://huggingface.co/datasets/c4).
Our pruning algorithm is described in the paper [Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes](https://arxiv.org/abs/2402.05406). 
[Code for pruning algorithm can be found here ](https://github.com/ldery/Bonsai/tree/main).

## Model Details
Model is derived from Pruning the  [Phi-2 Model](https://huggingface.co/microsoft/phi-2)

### Model Description

<!-- Provide a longer summary of what this model is. -->

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

- **Developed by:** Lucio Dery, Steven Kolawole, Jean-François Kagy, Virginia Smith, Graham Neubig, Ameet Talwalkar
- **Model type:** Decoder-only
- **Language(s) (NLP):** English
- **License:** MIT

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository:** [https://github.com/ldery/Bonsai/tree/main]
- **Paper [optional]:** [https://arxiv.org/abs/2402.05406]



## Training Details

### Training Data

Finetuned on 100K 2048 length sequences from the C4 dataset (https://huggingface.co/datasets/c4).

### Training Procedure 

Full fine-tuning.  


#### Training Hyperparameters

Distillation KL-Weight :  0.01

Learning Rate : 1e-4

Batch Size : 128

Optimzer : AdamW

Warmup Steps : 5



## Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hardware Type:** NVIDIA A6000

## Citation [optional]


**BibTeX:**

@misc{dery2024everybody,
      title={Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes}, 
      author={Lucio Dery and Steven Kolawole and Jean-Francois Kagey and Virginia Smith and Graham Neubig and Ameet Talwalkar},
      year={2024},
      eprint={2402.05406},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

## Model Card Authors [optional]

Lucio Dery: [email protected]

## Model Card Contact

[email protected]