Missing attention_mask on hook
#44
by
riedgar-ms
- opened
I'm attempting to use Phi-4 with the attention steering approach of PASTA. However, this is running into trouble because the hook set on the self-attention layer is not being passed an attention_mask
; the argument is present when the hooked function is called, but it is set to None. Is this expected? The same hook on Phi-3 works fine.