RE-AdaptIR: Improving Information Retrieval through Reverse Engineered Adaptation
Abstract
Large language models (LLMs) fine-tuned for text-retrieval have demonstrated state-of-the-art results across several information retrieval (IR) benchmarks. However, supervised training for improving these models requires numerous labeled examples, which are generally unavailable or expensive to acquire. In this work, we explore the effectiveness of extending reverse engineered adaptation to the context of information retrieval (RE-AdaptIR). We use RE-AdaptIR to improve LLM-based IR models using only unlabeled data. We demonstrate improved performance both in training domains as well as zero-shot in domains where the models have seen no queries. We analyze performance changes in various fine-tuning scenarios and offer findings of immediate use to practitioners.
Community
How can you improve text retrieval models with all that unlabeled data lying around? RE-AdaptIR extends reverse engineered adaptation to IR models. RepLLaMA and e5-Mistral are improved by using RE-AdaptIR to isolate IR training from the base models pretraining. Then unlabeled data is used to continue pretraining in the new domain. Finally, the model is readapted back to IR with improved performance thanks to the additional pretraining!
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper