Papers
arxiv:2605.08423

Queryable LoRA: Instruction-Regularized Routing Over Shared Low-Rank Update Atoms

Published on May 8
· Submitted by
Connor T. Jerzak
on May 12
Authors:
,
,

Abstract

A data-adaptive method for efficient fine-tuning of large neural networks uses a shared memory of low-rank update atoms with attention-based routing to enable dynamic, context-sensitive parameter updates while maintaining scalability.

AI-generated summary

We present a data-adaptive method for parameter-efficient fine-tuning of large neural networks. Standard low-rank adaptation methods improve efficiency by restricting each layer update to a fixed low-rank form, but this static parameterization can be too rigid when the appropriate correction depends on the input and on the evolving depth-wise computation of the network. Our approach replaces a purely layer-local adapter with a shared queryable memory of low-rank update atoms. For each block of layers, the model forms a query from the current low-rank state and a running summary of previous blocks, uses this query to retrieve a content-dependent combination of shared update components via attention, and applies the resulting routed operator within the low-rank bottleneck. In this way, the method retains the efficiency and scalability of low-rank adaptation while allowing the effective update to vary across inputs and to share reusable structure across layers. The resulting architecture provides a principled middle ground between static LoRA-style updates and fully generated parameter updates: it remains compact and parameter-efficient while supporting dynamic, context-sensitive adaptation. Further, we incorporate instruction-regularization by augmenting routing logits with a language-induced prior over update atoms, thereby biasing the selection of low-rank transformations toward semantically relevant directions without generating unconstrained parameter updates. Experiments on noisy non-linear regression tasks and LLM fine-tuning suggest that this queryable update-memory formulation can improve final test performance and training stability compared to standard low-rank adaptation, while using a comparable number of trainable parameters.

Community

Paper author Paper submitter
edited about 6 hours ago

What if LoRA adapters could "choose" different low-rank update patterns depending on the input, with text instructions regularizing this choice attention?

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.08423
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.08423 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.08423 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.08423 in a Space README.md to link it from this page.

Collections including this paper 1