Model Card for Model ID
We prune the Phi-2 (2.7B) model to 35% sparsty (1.8B) and then finetune on 100K 2048 length sequences from the C4 dataset (https://huggingface.co/datasets/c4). Our pruning algorithm is described in the paper Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes. Code for pruning algorithm can be found here .
Model Details
Model is derived from Pruning the Phi-2 Model
Model Description
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
- Developed by: Lucio Dery, Steven Kolawole, Jean-François Kagy, Virginia Smith, Graham Neubig, Ameet Talwalkar
- Model type: Decoder-only
- Language(s) (NLP): English
- License: MIT
Model Sources [optional]
- Repository: [https://github.com/ldery/Bonsai/tree/main]
- Paper [optional]: [https://arxiv.org/abs/2402.05406]
Training Details
Training Data
Finetuned on 100K 2048 length sequences from the C4 dataset (https://huggingface.co/datasets/c4).
Training Procedure
Full fine-tuning.
Training Hyperparameters
Distillation KL-Weight : 0.01
Learning Rate : 1e-4
Batch Size : 128
Optimzer : AdamW
Warmup Steps : 5
License
The model is licensed under the MIT license.
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: NVIDIA A6000
Citation
BibTeX:
@misc{dery2024everybody, title={Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes}, author={Lucio Dery and Steven Kolawole and Jean-Francois Kagey and Virginia Smith and Graham Neubig and Ameet Talwalkar}, year={2024}, eprint={2402.05406}, archivePrefix={arXiv}, primaryClass={cs.LG} }
Model Card Authors [optional]
Lucio Dery: [email protected]
Model Card Contact
- Downloads last month
- 71