1 1 11

Reduan Achtibat

RedOneAI

AI & ML interests

XAI

Recent Activity

liked a model 10 days ago

allenai/OLMo-2-1124-7B

liked a model 2 months ago

meta-llama/Llama-3.1-8B-Instruct

liked a Space 4 months ago

gsarti/mirage

View all activity

Organizations

None yet

RedOneAI's activity

liked a model 10 days ago

allenai/OLMo-2-1124-7B

Updated 7 days ago • 16.2k • 39

liked a model 2 months ago

meta-llama/Llama-3.1-8B-Instruct

Text Generation • Updated Sep 25, 2024 • 5.16M • • 3.43k

liked a Space 4 months ago

Running on Zero

🌴

MIRAGE

Model Internals to generate RAG citations

liked 2 models 11 months ago

TinyLlama/TinyLlama-1.1B-Chat-v1.0

Text Generation • Updated Mar 17, 2024 • 1.44M • 1.12k

openlm-research/open_llama_3b_v2

Text Generation • Updated Jul 16, 2023 • 307k • 148

upvoted a paper 11 months ago

AttnLRP: Attention-Aware Layer-wise Relevance Propagation for Transformers

Paper • 2402.05602 • Published Feb 8, 2024 • 4

reacted to gsarti's post with ❤️ 11 months ago

Post

🔍 Today's pick in Interpretability & Analysis of LMs: AttnLRP: Attention-Aware Layer-wise Relevance Propagation for Transformers by @RedOneAI et al.

This work proposes extending the LRP feature attribution framework to handling Transformers-specific layers. In particular: authors:

1. Propose a generalized approach to softmax linearization by designing a distribution rule that incorporates bias terms, absorbing of a portion of the relevance.
2. Propose decomposing the element-wise matrix multiplication in the attention operation as a sequential of epsilon and uniform distribution rules to ensure conservation (=sum of relevance stays constant across layers)
3. Propose handling normalisation layers with an identity distribution rule.

By means of extensive experiments, authors show that AttnLRP:

1. Is significantly more faithful than other popular gradient- and attention-based attribution approaches on CV and NLP tasks using large transformer models.
2. Runs in O(1) time, requiring O(sqrt(num_layers)) memory, as opposed to perturbation-based approaches requiring O(seq_len) time.
3. can be used alongside activation maximisation to explain the contribution of granular model components in driving models’ predictions.

📄 Paper: AttnLRP: Attention-Aware Layer-wise Relevance Propagation for Transformers (2402.05602)

🔍 All daily picks in LM interpretability: gsarti/daily-picks-in-interpretability-and-analysis-of-lms-65ae3339949c5675d25de2f9

New activity in liuhaotian/llava-v1.5-13b about 1 year ago

Missing file named preprocessor_config.json

#9 opened about 1 year ago by

guptaan

liked 6 models about 1 year ago