metadata
language: en
license: apache-2.0
Shears Model Card: Shears-llama-13b-50-math-heuristic
Fine tuned model on LLaMA-13B with some math reasoning datasets using Shears.
Model Details
Information
- Model name: Shears-llama-13b-50-math-heuristic
- Base model: LLaMA-13b
- Sparsity: 50%
- Domain: Math
- Subnetwork version: Heuristic
Adapter Configuration
- LoRA rank: 32 (24 in the heuristic subnetwork)
- LoRA alpha: 64
- LoRA target modules: q_proj, k_proj, v_proj, up_proj, down_proj
- LoRA rank search space: [32, 24, 16]
Training Hyperparameters
- Batch size: 16
- Learning rate: 3e-4
- Epoch: 3
Training Data
Unified math reasoning dataset: math_10k.json (collected with the training sets of GSM8K, MAWPS, and AQuA).
Evaluation Data
How to use
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
def generate_prompt(instruction):
return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Response:
"""
base_model_path = "shears-llama-13b-50-math-heuristic/base_model"
adapter_model_path = "shears-llama-13b-50-math-heuristic/adapter_model"
base_model = AutoModelForCausalLM.from_pretrained(base_model_path)
model = PeftModel.from_pretrained(base_model, adapter_model_path)
model.eval()
non_zero_params = sum([(param.data != 0).sum().item() for _, param in model.named_parameters()])
print(f"Number of all non-zero parameters: {non_zero_params}")
tokenizer = AutoTokenizer.from_pretrained(model_path)
tokenizer.pad_token_id = 0
instruction = "Edgar eats 18 pretzels a day. If his brother eats 1/2 as many, how many does his brother eat in a week?"
prompt = generate_prompt(instruction)
inputs = tokenizer(prompt, return_tensors="pt")
input_ids = inputs["input_ids"].to(model.device)
with torch.no_grad():
generation_output = model.generate(
input_ids=input_ids,
return_dict_in_generate=True,
output_scores=True,
max_new_tokens=256,
use_cache=True,
num_beams=4,
)
s = generation_output.sequences[0]
output = tokenizer.decode(s)
print(output)
Evaluation Results
Model | Sparsity | GSM8K | AQuA | MAWPS | SVAMP | Average |
---|---|---|---|---|---|---|
LLaMA-7B-LoRA | - | 37.5 | 18.9 | 79.0 | 52.1 | 46.9 |
LLaMA-7B-Shears | 50% | 36.1 | 22.0 | 78.6 | 44.5 | 45.3 |
LLaMA-13B-LoRA | - | 47.5 | 18.5 | 83.6 | 54.6 | 51.1 |
LLaMA-13B-Shears | 50% | 45.1 | 22.0 | 83.2 | 53.3 | 50.9 |
Model Sources
- Repository: https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning/tree/main/Shears
- Paper: Shears: Unstructured Sparsity with Neural Low-rank Adapter Search
Citation
@article{munoz2024shears,
title = {Shears: Unstructured Sparsity with Neural Low-rank Adapter Search},
author={J. Pablo Munoz and Jinjie Yuan and Nilesh Jain},
journal={},
year={2024}
}
License
Apache-2.0