PEFT
Safetensors
English
File size: 4,986 Bytes
3b8e6a9
 
 
 
 
0e4a2e4
3b8e6a9
0e4a2e4
3b8e6a9
 
 
 
 
0e4a2e4
3b8e6a9
 
 
 
0e4a2e4
3b8e6a9
 
 
 
 
 
0e4a2e4
3b8e6a9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0e4a2e4
 
 
 
 
 
3b8e6a9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
---

language: en
license: apache-2.0
---


# Shears Model Card: shears-llama-13b-50-math-heuristic

The heuristic subnetwork discovered from the [super-network](https://huggingface.co/IntelLabs/shears-llama-13b-50-math-super) fine-tuned on LLaMA-13B with some math reasoning datasets using Shears.

## Model Details

### Information

- **Model name:** shears-llama-13b-50-math-heuristic
- **Base model:** [LLaMA-13b](https://huggingface.co/yahma/llama-13b-hf)
- **Sparsity:** 50%
- **Domain:** Math
- **Subnetwork version:** Heuristic
- **NNCF Configuration:** [nncf_shears_llama_13b_sparsity50.json](https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning/tree/main/Shears/nncf_config/unified_math/nncf_shears_llama_13b_sparsity50.json)

### Adapter Configuration

- **LoRA rank:** 32 (24 in the heuristic subnetwork)
- **LoRA alpha:** 64
- **LoRA target modules:** q_proj, k_proj, v_proj, up_proj, down_proj

- **LoRA rank search space:** [32, 24, 16] (for each LoRA module)



### Training Hyperparameters



- **Batch size:** 16

- **Learning rate:** 3e-4

- **Epoch:** 3



### Training Data



Unified math reasoning dataset: [math_10k.json](https://github.com/AGI-Edgerunners/LLM-Adapters/blob/main/ft-training_set/math_10k.json) (collected with the training sets of GSM8K, MAWPS, and AQuA).



### Evaluation Data

[GSM8K](https://github.com/AGI-Edgerunners/LLM-Adapters/blob/main/dataset/gsm8k/test.json), [AQuA](https://github.com/AGI-Edgerunners/LLM-Adapters/blob/main/dataset/AQuA/test.json), [MAWPS](https://github.com/AGI-Edgerunners/LLM-Adapters/blob/main/dataset/mawps/test.json), [SVAMP](https://github.com/AGI-Edgerunners/LLM-Adapters/blob/main/dataset/SVAMP/test.json)





## How to use



Use our modified PEFT library (apply [patch](https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning/tree/main/Shears/patches/peft-modifications-for-shears-inference-usage.patch)):

```bash

git clone https://github.com/huggingface/peft.git

pushd peft && git checkout v0.5.0 && git apply --ignore-space-change --ignore-whitespace peft-modifications-for-shears-inference-usage.patch && pip install -e . && popd

```



```python

import torch

from peft import PeftModel

from transformers import AutoModelForCausalLM

from transformers import AutoTokenizer



def generate_prompt(instruction):
    return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request. 


                    ### Instruction:

                    {instruction}


                    ### Response:

                    """


base_model_path = "shears-llama-13b-50-math-heuristic/base_model"

adapter_model_path = "shears-llama-13b-50-math-heuristic/adapter_model"
base_model = AutoModelForCausalLM.from_pretrained(base_model_path)
model = PeftModel.from_pretrained(base_model, adapter_model_path)
model.eval()

non_zero_params = sum([(param.data != 0).sum().item() for _, param in model.named_parameters()])
print(f"Number of all non-zero parameters: {non_zero_params}")

tokenizer = AutoTokenizer.from_pretrained(model_path)
tokenizer.pad_token_id = 0

instruction = "Edgar eats 18 pretzels a day. If his brother eats 1/2 as many, how many does his brother eat in a week?"
prompt = generate_prompt(instruction)

inputs = tokenizer(prompt, return_tensors="pt")
input_ids = inputs["input_ids"].to(model.device)
with torch.no_grad():

    generation_output = model.generate(
        input_ids=input_ids,

        return_dict_in_generate=True,

        output_scores=True,

        max_new_tokens=256,

        use_cache=True,

        num_beams=4,

    )

  s = generation_output.sequences[0]

  output = tokenizer.decode(s)

print(output)


```



## Evaluation Results



| Model                 | Sparsity    | GSM8K | AQuA  | MAWPS | SVAMP | Average |

|-----------------------|-------------|-------|-------|-------|-------|---------|

| LLaMA-7B-LoRA         | -           | 37.5  | 18.9  | 79.0  | 52.1  | 46.9    |

| [**LLaMA-7B-Shears**](https://huggingface.co/IntelLabs/shears-llama-7b-50-math-heuristic)   | **50%**     | 36.1  | 22.0  | 78.6  | 44.5  | 45.3    |

| LLaMA-13B-LoRA        | -           | 47.5  | 18.5  | 83.6  | 54.6  | 51.1    |

| [**LLaMA-13B-Shears**](https://huggingface.co/IntelLabs/shears-llama-13b-50-math-heuristic)  | **50%**     | 45.1  | 22.0  | 83.2  | 53.3  | 50.9    |



## Model Sources



- **Repository:** [https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning/tree/main/Shears](https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning/tree/main/Shears)

- **Paper:** [Shears: Unstructured Sparsity with Neural Low-rank Adapter Search]()



## Citation



```bash

@article{munoz2024shears,

  title = {Shears: Unstructured Sparsity with Neural Low-rank Adapter Search},

  author={J. Pablo Munoz and Jinjie Yuan and Nilesh Jain},

  journal={},

  year={2024}

}

```

## License

Apache-2.0