Why doesn't the S1/S2 text encoder use attn_mask or key_padding_mask to deal with padding tokens? This seems to cause attention to be paid to the padding tokens instead of just the valid tokens.
Your need to confirm your account before you can post a new comment.