File size: 4,986 Bytes
3b8e6a9 0e4a2e4 3b8e6a9 0e4a2e4 3b8e6a9 0e4a2e4 3b8e6a9 0e4a2e4 3b8e6a9 0e4a2e4 3b8e6a9 0e4a2e4 3b8e6a9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
---
language: en
license: apache-2.0
---
# Shears Model Card: shears-llama-13b-50-math-heuristic
The heuristic subnetwork discovered from the [super-network](https://huggingface.co/IntelLabs/shears-llama-13b-50-math-super) fine-tuned on LLaMA-13B with some math reasoning datasets using Shears.
## Model Details
### Information
- **Model name:** shears-llama-13b-50-math-heuristic
- **Base model:** [LLaMA-13b](https://huggingface.co/yahma/llama-13b-hf)
- **Sparsity:** 50%
- **Domain:** Math
- **Subnetwork version:** Heuristic
- **NNCF Configuration:** [nncf_shears_llama_13b_sparsity50.json](https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning/tree/main/Shears/nncf_config/unified_math/nncf_shears_llama_13b_sparsity50.json)
### Adapter Configuration
- **LoRA rank:** 32 (24 in the heuristic subnetwork)
- **LoRA alpha:** 64
- **LoRA target modules:** q_proj, k_proj, v_proj, up_proj, down_proj
- **LoRA rank search space:** [32, 24, 16] (for each LoRA module)
### Training Hyperparameters
- **Batch size:** 16
- **Learning rate:** 3e-4
- **Epoch:** 3
### Training Data
Unified math reasoning dataset: [math_10k.json](https://github.com/AGI-Edgerunners/LLM-Adapters/blob/main/ft-training_set/math_10k.json) (collected with the training sets of GSM8K, MAWPS, and AQuA).
### Evaluation Data
[GSM8K](https://github.com/AGI-Edgerunners/LLM-Adapters/blob/main/dataset/gsm8k/test.json), [AQuA](https://github.com/AGI-Edgerunners/LLM-Adapters/blob/main/dataset/AQuA/test.json), [MAWPS](https://github.com/AGI-Edgerunners/LLM-Adapters/blob/main/dataset/mawps/test.json), [SVAMP](https://github.com/AGI-Edgerunners/LLM-Adapters/blob/main/dataset/SVAMP/test.json)
## How to use
Use our modified PEFT library (apply [patch](https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning/tree/main/Shears/patches/peft-modifications-for-shears-inference-usage.patch)):
```bash
git clone https://github.com/huggingface/peft.git
pushd peft && git checkout v0.5.0 && git apply --ignore-space-change --ignore-whitespace peft-modifications-for-shears-inference-usage.patch && pip install -e . && popd
```
```python
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
def generate_prompt(instruction):
return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Response:
"""
base_model_path = "shears-llama-13b-50-math-heuristic/base_model"
adapter_model_path = "shears-llama-13b-50-math-heuristic/adapter_model"
base_model = AutoModelForCausalLM.from_pretrained(base_model_path)
model = PeftModel.from_pretrained(base_model, adapter_model_path)
model.eval()
non_zero_params = sum([(param.data != 0).sum().item() for _, param in model.named_parameters()])
print(f"Number of all non-zero parameters: {non_zero_params}")
tokenizer = AutoTokenizer.from_pretrained(model_path)
tokenizer.pad_token_id = 0
instruction = "Edgar eats 18 pretzels a day. If his brother eats 1/2 as many, how many does his brother eat in a week?"
prompt = generate_prompt(instruction)
inputs = tokenizer(prompt, return_tensors="pt")
input_ids = inputs["input_ids"].to(model.device)
with torch.no_grad():
generation_output = model.generate(
input_ids=input_ids,
return_dict_in_generate=True,
output_scores=True,
max_new_tokens=256,
use_cache=True,
num_beams=4,
)
s = generation_output.sequences[0]
output = tokenizer.decode(s)
print(output)
```
## Evaluation Results
| Model | Sparsity | GSM8K | AQuA | MAWPS | SVAMP | Average |
|-----------------------|-------------|-------|-------|-------|-------|---------|
| LLaMA-7B-LoRA | - | 37.5 | 18.9 | 79.0 | 52.1 | 46.9 |
| [**LLaMA-7B-Shears**](https://huggingface.co/IntelLabs/shears-llama-7b-50-math-heuristic) | **50%** | 36.1 | 22.0 | 78.6 | 44.5 | 45.3 |
| LLaMA-13B-LoRA | - | 47.5 | 18.5 | 83.6 | 54.6 | 51.1 |
| [**LLaMA-13B-Shears**](https://huggingface.co/IntelLabs/shears-llama-13b-50-math-heuristic) | **50%** | 45.1 | 22.0 | 83.2 | 53.3 | 50.9 |
## Model Sources
- **Repository:** [https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning/tree/main/Shears](https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning/tree/main/Shears)
- **Paper:** [Shears: Unstructured Sparsity with Neural Low-rank Adapter Search]()
## Citation
```bash
@article{munoz2024shears,
title = {Shears: Unstructured Sparsity with Neural Low-rank Adapter Search},
author={J. Pablo Munoz and Jinjie Yuan and Nilesh Jain},
journal={},
year={2024}
}
```
## License
Apache-2.0
|