Your Transformer is Secretly Linear
Paper
•
2405.12250
•
Published
•
149
Note Where the authors come up with a metric for linearity based on the Procrustes problem and then use that metric to understand how different parts of a LM can be approximated by a linear transform. Turns out some parts can be pretty linear, especially after fine-tuning for a while. I played a bit with those ideas here: https://x.com/kgourg/status/1795428844167393598 Here's some: https://colab.research.google.com/drive/1H9sdTMEDzesVETzx5i1Vai_yrViEgkcM?usp=sharing