TokenButler
Collection
TokenButler -- Predict token importance for all heads across the transformer in the first layer itself. Enable fine-grained token sparsity!
•
6 items
•
Updated
•
2
The collection of TokenButler models can be found here. To run the meta-llama/Llama-2-7b-hf
model, follow:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
question = "If millionaires have butlers, why don't million dollar language models have a butler too? I think its because "
model_name = "akhauriyash/Llama-2-7b-hf-Butler"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)
generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
response = generator(question, max_new_tokens=200, do_sample=True, top_p=0.95, temperature=0.7)
print(response[0]['generated_text'][len(question):])
Note that the 'default' configured sparsity is 50%. Further, there is a 'sliding window' of 128 and 8 'anchor tokens'. To 'change' the sparsity, you can use the following function after loading the model. Please note that the 'fixed' is the only supported strategy at the moment, which 'fixes' the sparsity of each layer (except the first) at the 'pc' (percentage) mentioned. This can also be found at test_hf.py
. Sliding window and anchor tokens can be changed in a similar manner.
def set_sparsity(model, sparsity):
for module in model.modules():
if module.__class__.__name__.__contains__("AttentionExperimental"):
module.token_sparse_method = sparsity
module.set_token_sparsity()
return model
model = set_sparsity(model, "fixed_60pc")
@misc{akhauri2025tokenbutlertokenimportancepredictable,
title={TokenButler: Token Importance is Predictable},
author={Yash Akhauri and Ahmed F AbouElhamayed and Yifei Gao and Chi-Chih Chang and Nilesh Jain and Mohamed S. Abdelfattah},
year={2025},
eprint={2503.07518},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2503.07518},
}
Base model
meta-llama/Llama-2-7b-hf