--- library_name: transformers tags: - Structured Pruning - Phi-2 - Memory-efficient Pruning license: mit language: - en --- # Model Card for Model ID We prune the Phi-2 (2.7B) model to 35% sparsty (1.8B) and then finetune on 100K 2048 length sequences from the C4 dataset (https://huggingface.co/datasets/c4). Our pruning algorithm is described in the paper [Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes](https://arxiv.org/abs/2402.05406). [Code for pruning algorithm can be found here ](https://github.com/ldery/Bonsai/tree/main). ## Model Details Model is derived from Pruning the [Phi-2 Model](https://huggingface.co/microsoft/phi-2) ### Model Description This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - **Developed by:** Lucio Dery, Steven Kolawole, Jean-François Kagy, Virginia Smith, Graham Neubig, Ameet Talwalkar - **Model type:** Decoder-only - **Language(s) (NLP):** English - **License:** MIT ### Model Sources [optional] - **Repository:** [https://github.com/ldery/Bonsai/tree/main] - **Paper [optional]:** [https://arxiv.org/abs/2402.05406] ## Training Details ### Training Data Finetuned on 100K 2048 length sequences from the C4 dataset (https://huggingface.co/datasets/c4). ### Training Procedure Full fine-tuning. #### Training Hyperparameters Distillation KL-Weight : 0.01 Learning Rate : 1e-4 Batch Size : 128 Optimzer : AdamW Warmup Steps : 5 ### License The model is licensed under the [MIT license](https://huggingface.co/luciodery/Bonsai-PrunedPhi-1.8B/blob/main/LICENSE). ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** NVIDIA A6000 ## Citation **BibTeX:** @misc{dery2024everybody, title={Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes}, author={Lucio Dery and Steven Kolawole and Jean-Francois Kagey and Virginia Smith and Graham Neubig and Ameet Talwalkar}, year={2024}, eprint={2402.05406}, archivePrefix={arXiv}, primaryClass={cs.LG} } ## Model Card Authors [optional] Lucio Dery: ldery@andrew.cmu.edu ## Model Card Contact ldery@andrew.cmu.edu