|
--- |
|
library_name: transformers |
|
tags: |
|
- Structured Pruning |
|
- Phi-2 |
|
- Memory-efficient Pruning |
|
license: mit |
|
language: |
|
- en |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
We prune the Phi-2 (2.7B) model to 35% sparsty (1.8B) and then finetune on 100K 2048 length sequences from the C4 dataset (https://huggingface.co/datasets/c4). |
|
Our pruning algorithm is described in the paper [Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes](https://arxiv.org/abs/2402.05406). |
|
[Code for pruning algorithm can be found here ](https://github.com/ldery/Bonsai/tree/main). |
|
|
|
## Model Details |
|
Model is derived from Pruning the [Phi-2 Model](https://huggingface.co/microsoft/phi-2) |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. |
|
|
|
- **Developed by:** Lucio Dery, Steven Kolawole, Jean-François Kagy, Virginia Smith, Graham Neubig, Ameet Talwalkar |
|
- **Model type:** Decoder-only |
|
- **Language(s) (NLP):** English |
|
- **License:** MIT |
|
|
|
### Model Sources [optional] |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **Repository:** [https://github.com/ldery/Bonsai/tree/main] |
|
- **Paper [optional]:** [https://arxiv.org/abs/2402.05406] |
|
|
|
|
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
Finetuned on 100K 2048 length sequences from the C4 dataset (https://huggingface.co/datasets/c4). |
|
|
|
### Training Procedure |
|
|
|
Full fine-tuning. |
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
Distillation KL-Weight : 0.01 |
|
|
|
Learning Rate : 1e-4 |
|
|
|
Batch Size : 128 |
|
|
|
Optimzer : AdamW |
|
|
|
Warmup Steps : 5 |
|
|
|
### License |
|
|
|
The model is licensed under the [MIT license](https://huggingface.co/luciodery/Bonsai-PrunedPhi-1.8B/blob/main/LICENSE). |
|
|
|
|
|
## Environmental Impact |
|
|
|
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly --> |
|
|
|
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). |
|
|
|
- **Hardware Type:** NVIDIA A6000 |
|
|
|
## Citation |
|
|
|
|
|
**BibTeX:** |
|
|
|
@misc{dery2024everybody, |
|
title={Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes}, |
|
author={Lucio Dery and Steven Kolawole and Jean-Francois Kagey and Virginia Smith and Graham Neubig and Ameet Talwalkar}, |
|
year={2024}, |
|
eprint={2402.05406}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.LG} |
|
} |
|
|
|
|
|
## Model Card Authors [optional] |
|
|
|
Lucio Dery: [email protected] |
|
|
|
## Model Card Contact |
|
|
|
[email protected] |