Papers I read - a leegao19 Collection

leegao19 's Collections

updated Jan 3

ShortGPT: Layers in Large Language Models are More Redundant Than You Expect

Paper • 2403.03853 • Published Mar 6, 2024 • 63

Note 1. For each layer, compute the dot-product between the hidden state vectors of the input tokens (Xi,t) and the corresponding output hidden state vectors (Xi+1,t). If the input and output vectors are very similar, it implies the layer didn’t do much transformation and thus has a low BI (block influence). 2. Use calibration set to "profile" the model and compute layerwise BI over this evaluation set 3. Prune low BI blocks first
Revisiting In-Context Learning with Long Context Language Models

Paper • 2412.16926 • Published Dec 22, 2024 • 30
Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding

Paper • 2501.00712 • Published Jan 1 • 6